Daniele Gammelli

ASL Publications

L. Schroth, D. Morton, A. Lahr, D. Gammelli, A. Carron, and M. Pavone, “Multi-Timescale Model Predictive Control for Slow-Fast Systems,” in European Control Conference, 2026. (Submitted)
[BibTeX] [Abstract]

Abstract: MPC has established itself as the primary methodology for constrained control, enabling autonomy across diverse applications. While model fidelity is crucial in MPC, solving the corresponding optimization problem in real time remains challenging when combining long horizons with high-fidelity models that capture both short-term dynamics and long-term behavior. Motivated by results on the EDS, which imply that, under certain conditions, the influence of modeling inaccuracies decreases exponentially along the prediction horizon, this paper proposes a multi-timescale MPC scheme for fast-sampled control. Tailored to systems with both fast and slow dynamics, the proposed approach improves computational efficiency by i) switching to a reduced model that captures only the slow, dominant dynamics and ii) exponentially increasing integration step sizes to progressively reduce model detail along the horizon. We evaluate the method on three practically motivated robotic control problems in simulation and observe speed-ups of up to an order of magnitude.
```
@inproceedings{SchrothMortonEtAl2025,
  author = {Schroth, L. and Morton, D. and Lahr, A. and Gammelli, D. and Carron, A. and Pavone, M.},
  title = {Multi-Timescale Model Predictive Control for Slow-Fast Systems},
  booktitle = {{European Control Conference}},
  year = {2026},
  month = nov,
  url = {https://arxiv.org/pdf/2511.14311},
  keywords = {sub},
  owner = {dmorton},
  timestamp = {2025-11-18}
}
```
Y. Takubo, D. Gammelli, M. Pavone, and S. D’Amico, “Agile Tradespace Exploration for Space Rendezvous Mission Design via Transformers,” in IEEE Aerospace Conference, 2026.
[BibTeX] [Abstract]

Abstract: Future multi-spacecraft missions require robust autonomous trajectory optimization capabilities to ensure safe and efficient rendezvous operations. This capability hinges on solving non-convex optimal control problems in real time, although traditional iterative methods such as sequential convex programming impose significant computational challenges. To mitigate this burden, the Autonomous Rendezvous Transformer introduced a generative model trained to provide near-optimal initial guesses. This approach provides convergence to better local optima (e.g., fuel optimality), improves feasibility rates, and results in faster convergence speed of optimization algorithms through warm-starting. This work extends the capabilities of ART to address robust chance-constrained optimal control problems. Specifically, ART is applied to challenging rendezvous scenarios in Low Earth Orbit (LEO), ensuring fault-tolerant behavior under uncertainty. Through extensive experimentation, the proposed warm-starting strategy is shown to consistently produce high-quality reference trajectories, achieving up to 30% cost improvement and 50% reduction in infeasible cases compared to conventional methods, demonstrating robust performance across multiple state representations. Additionally, a post hoc evaluation framework is proposed to assess the quality of generated trajectories and mitigate runtime failures, marking an initial step toward the reliable deployment of AI-driven solutions in safety-critical autonomous systems such as spacecraft.
```
@inproceedings{TakuboGammelliEtAl2025,
  author = {Takubo, Y. and Gammelli, D. and Pavone, M. and {D'Amico}, S.},
  title = {Agile Tradespace Exploration for Space Rendezvous Mission Design via Transformers},
  booktitle = {{IEEE Aerospace Conference}},
  year = {2026},
  keywords = {pub},
  owner = {gammelli},
  timestamp = {2025-10-23},
  url = {https://arxiv.org/abs/2510.03544?}
}
```
Y. Takubo, A. Dwivedi, S. Ramkumar, L. Pabon, D. Gammelli, M. Pavone, and S. D’Amico, “Semantic Trajectory Generation for Goal-Oriented Spacecraft Rendezvous,” in AIAA Scitech Forum, 2026.
[BibTeX] [Abstract]

Abstract: Reliable real-time trajectory generation is essential for future autonomous spacecraft. While recent progress in nonconvex guidance and control is paving the way for onboard autonomous trajectory optimization, these methods still rely on extensive expert input (e.g., waypoints, constraints, mission timelines, etc.), which limits the operational scalability in real rendezvous missions. This paper introduces SAGES (Semantic Autonomous Guidance Engine for Space), a trajectory-generation framework that translates natural-language commands into spacecraft trajectories that reflect high-level intent while respecting nonconvex constraints. Experiments in two settings – fault-tolerant proximity operations with continuous-time constraint enforcement and a free-flying robotic platform – demonstrate that SAGES reliably produces trajectories aligned with human commands, achieving over 90% semantic-behavioral consistency across diverse behavior modes. Ultimately, this work marks an initial step toward language-conditioned, constraint-aware spacecraft trajectory generation, enabling operators to interactively guide both safety and behavior through intuitive natural-language commands with reduced expert burden.
```
@inproceedings{TakuboDwivediEtAl2026,
  author = {Takubo, Y. and Dwivedi, A. and Ramkumar, S. and Pabon, L. and Gammelli, D. and Pavone, M. and {D'Amico}, S.},
  title = {Semantic Trajectory Generation for Goal-Oriented Spacecraft Rendezvous},
  booktitle = {{AIAA Scitech Forum}},
  year = {2026},
  keywords = {pub},
  owner = {gammelli},
  timestamp = {2025-12-20},
  url = {https://arxiv.org/abs/2512.09111}
}
```
L. Tresca, C. Schmidt, J. Harrison, F. Rodrigues, G. Zardini, D. Gammelli, and M. Pavone, “Robo-taxi Fleet Coordination at Scale via Reinforcement Learning,” in IEEE Transactions on Control of Network Systems, 2026. (Submitted)
[BibTeX] [Abstract]

Abstract: Fleets of robo-taxis offering on-demand transportation services, commonly known as Autonomous Mobility-on-Demand (AMoD) systems, hold significant promise for societal benefits, such as reducing pollution, energy consumption, and urban congestion. However, orchestrating these systems at scale remains a critical challenge, with existing coordination algorithms often failing to exploit the systems’ full potential. This work introduces a novel decision-making framework that unites mathematical modeling with data-driven techniques. In particular, we present the AMoD coordination problem through the lens of reinforcement learning and propose a graph network-based framework that exploits the main strengths of graph representation learning, reinforcement learning, and classical operations research tools. Extensive evaluations across diverse simulation fidelities and scenarios demonstrate the flexibility of our approach, achieving superior system performance, computational efficiency, and generalizability compared to prior methods. Finally, motivated by the need to democratize research efforts in this area, we release publicly available benchmarks, datasets, and simulators for network-level coordination alongside an open-source codebase designed to provide accessible simulation platforms and establish a standardized validation process for comparing methodologies.
```
@inproceedings{TrescaSchmidtEtAl2026,
  author = {Tresca, L and Schmidt, C. and Harrison, J. and Rodrigues, F. and Zardini, G. and Gammelli, D. and Pavone, M.},
  title = {Robo-taxi Fleet Coordination at Scale via Reinforcement Learning},
  booktitle = {{IEEE Transactions on Control of Network Systems}},
  year = {2026},
  keywords = {sub},
  note = {Submitted},
  owner = {gammelli},
  timestamp = {2025-10-24},
  url = {https://arxiv.org/abs/2504.06125}
}
```
Y. Takubo, T. Guffanti, D. Gammelli, M. Pavone, and S. D’Amico, “Towards Robust Spacecraft Trajectory Optimization via Transformers,” in IEEE Aerospace Conference, 2025.
[BibTeX] [Abstract]

Abstract: Future multi-spacecraft missions require robust autonomous trajectory optimization capabilities to ensure safe and efficient rendezvous operations. This capability hinges on solving non-convex optimal control problems in real time, although traditional iterative methods such as sequential convex programming impose significant computational challenges. To mitigate this burden, the Autonomous Rendezvous Transformer introduced a generative model trained to provide near-optimal initial guesses. This approach provides convergence to better local optima (e.g., fuel optimality), improves feasibility rates, and results in faster convergence speed of optimization algorithms through warm-starting. This work extends the capabilities of ART to address robust chance-constrained optimal control problems. Specifically, ART is applied to challenging rendezvous scenarios in Low Earth Orbit (LEO), ensuring fault-tolerant behavior under uncertainty. Through extensive experimentation, the proposed warm-starting strategy is shown to consistently produce high-quality reference trajectories, achieving up to 30% cost improvement and 50% reduction in infeasible cases compared to conventional methods, demonstrating robust performance across multiple state representations. Additionally, a post hoc evaluation framework is proposed to assess the quality of generated trajectories and mitigate runtime failures, marking an initial step toward the reliable deployment of AI-driven solutions in safety-critical autonomous systems such as spacecraft.
```
@inproceedings{TakuboGammelliEtAl2024,
  author = {Takubo, Y. and Guffanti, T. and Gammelli, D. and Pavone, M. and {D'Amico}, S.},
  title = {Towards Robust Spacecraft Trajectory Optimization via Transformers},
  booktitle = {{IEEE Aerospace Conference}},
  year = {2025},
  keywords = {pub},
  owner = {jthluke},
  timestamp = {2024-10-30},
  url = {https://arxiv.org/abs/2410.05585}
}
```
X. Li, M. Alharbi, D. Gammelli, J. Harrison, F. Rodrigues, M. Schiffer, M. Pavone, E. Frazzoli, J. Zhao, and G. Zardini, “Reproducibility in the Control of Autonomous Mobility-on-Demand Systems,” in IEEE Transactions on Robotics, 2025. (Submitted)
[BibTeX] [Abstract]

Abstract: Autonomous Mobility-on-Demand (AMoD) systems, powered by advances in robotics, control, and Machine Learning (ML), offer a promising paradigm for future urban transportation. AMoD offers fast and personalized travel services by leveraging centralized control of autonomous vehicle fleets to optimize operations and enhance service performance. However, the rapid growth of this field has outpaced the development of standardized practices for evaluating and reporting results, leading to significant challenges in reproducibility. As AMoD control algorithms become increasingly complex and data-driven, a lack of transparency in modeling assumptions, experimental setups, and algorithmic implementation hinders scientific progress and undermines confidence in the results. This paper presents a systematic study of reproducibility in AMoD research. We identify key components across the research pipeline, spanning system modeling, control problems, simulation design, algorithm specification, and evaluation, and analyze common sources of irreproducibility. We survey prevalent practices in the literature, highlight gaps, and propose a structured framework to assess and improve reproducibility. Specifically, concrete guidelines are offered, along with a "reproducibility checklist", to support future work in achieving replicable, comparable, and extensible results. While focused on AMoD, the principles and practices we advocate generalize to a broader class of cyber-physical systems that rely on networked autonomy and data-driven control. This work aims to lay the foundation for a more transparent and reproducible research culture in the design and deployment of intelligent mobility systems.
```
@inproceedings{LiAlharbiGammelliEtAl2025,
  author = {Li, X. and Alharbi, M. and Gammelli, D. and Harrison, J. and Rodrigues, F. and Schiffer, M. and Pavone, M. and Frazzoli, E. and Zhao, J. and Zardini, G.},
  title = {Reproducibility in the Control of Autonomous Mobility-on-Demand Systems},
  booktitle = {{IEEE Transactions on Robotics}},
  year = {2025},
  keywords = {sub},
  note = {Submitted},
  owner = {gammelli},
  timestamp = {2025-10-23},
  url = {https://arxiv.org/abs/2506.07345}
}
```
M. P. Ronecker, M. Foutter, A. Elhafsi, D. Gammelli, I. Barakaiev, M. Pavone, and D. Watzenig, “Vision Foundation Model Embedding-based Semantic Anomaly Detection,” in Proc. IEEE Conf. on Robotics and Automation: Workshop Safe-VLM, 2025.
[BibTeX] [Abstract]

Abstract: Semantic anomalies are contextually invalid or unusual combinations of familiar visual elements that can cause undefined behavior and failures in system-level reasoning for autonomous systems. This work explores semantic anomaly detection by leveraging the semantic priors of state-of-the-art vision foundation models, operating directly on the image. We propose a framework that compares local vision embeddings from runtime images to a database of nominal scenarios in which the autonomous system is deemed safe and performant. In this work, we consider two variants of the proposed framework: one using raw grid-based embeddings, and another leveraging instance segmentation for object-centric representations. To further improve robustness, we introduce a simple filtering mechanism to suppress false positives. Our evaluations on CARLA-simulated anomalies show that the instance-based method with filtering achieves performance comparable to GPT-4o, while providing precise anomaly localization. These results highlight the potential utility of vision embeddings from foundation models for real-time anomaly detection in autonomous systems.
```
@inproceedings{RoneckerFoutterGammelliEtAl2025,
  author = {Ronecker, M. P. and Foutter, M. and Elhafsi, A. and Gammelli, D. and Barakaiev, I. and Pavone, M. and Watzenig, D.},
  title = {Vision Foundation Model Embedding-based Semantic Anomaly Detection},
  booktitle = {{Proc. IEEE Conf. on Robotics and Automation: Workshop Safe-VLM}},
  year = {2025},
  keywords = {pub},
  owner = {gammelli},
  timestamp = {2025-10-23},
  url = {https://arxiv.org/abs/2505.07998}
}
```
C. Schmidt, D. Gammelli, J. Harrison, M. Pavone, and F. Rodrigues, “Offline Hierarchical Reinforcement Learning via Inverse Optimization,” in Int. Conf. on Learning Representations, 2025.
[BibTeX] [Abstract]

Abstract: Hierarchical policies enable strong performance in many sequential decision-making problems, such as those with high-dimensional action spaces, those requiring long-horizon planning, and settings with sparse rewards. However, learning hierarchical policies from static offline datasets presents a significant challenge. Crucially, actions taken by higher-level policies may not be directly observable within hierarchical controllers, and the offline dataset might have been generated using a different policy structure, hindering the use of standard offline learning algorithms. In this work, we propose OHIO: a framework for offline reinforcement learning (RL) of hierarchical policies. Our framework leverages knowledge of the policy structure to solve the inverse problem, recovering the unobservable high-level actions that likely generated the observed data under our hierarchical policy. This approach constructs a dataset suitable for off-the-shelf offline training. We demonstrate our framework on robotic and network optimization problems and show that it substantially outperforms end-to-end RL methods and improves robustness. We investigate a variety of instantiations of our framework, both in direct deployment of policies trained offline and when online fine-tuning is performed.
```
@inproceedings{SchmidtGammelliEtAl2024,
  author = {Schmidt, C. and Gammelli, D. and Harrison, J. and Pavone, M. and Rodrigues, F.},
  title = {Offline Hierarchical Reinforcement Learning via Inverse Optimization},
  booktitle = {{Int. Conf. on Learning Representations}},
  keywords = {pub},
  year = {2025},
  owner = {gammelli},
  timestamp = {2024-08-14},
  url = {https://arxiv.org/abs/2410.07933}
}
```
M. Foutter, D. Gammelli, J. Kruger, E. Foss, P. Bhoj, T. Guffanti, S. D’Amico, and M. Pavone, “Space-LLaVA: a Vision-Language Model Adapted to Extraterrestrial Applications,” in IEEE Aerospace Conference, 2025.
[BibTeX] [Abstract]

Abstract: Foundation Models (FMs), e.g., large language models, possess attributes of intelligence which offer promise to endow a robot with the contextual understanding necessary to navigate complex, unstructured tasks in the wild. We see three core challenges in the future of space robotics that motivate building an FM for the space robotics community: 1) Scalability of ground-in-the-loop operations; 2) Generalizing prior knowledge to novel environments; and 3) Multi-modality in tasks and sensor data. As a first-step towards a space foundation model, we programmatically augment three extraterrestrial databases with fine-grained language annotations inspired by the sensory reasoning necessary to e.g., identify a site of scientific interest on Mars, building a synthetic dataset of visual-question-answer and visual instruction-following tuples. We fine-tune a pre-trained LLaVA 13B checkpoint on our augmented dataset to adapt a Vision-Language Model (VLM) to the visual semantic features in an extraterrestrial environment, demonstrating FMs as a tool for specialization and enhancing a VLM’s zero-shot performance on unseen task types in comparison to state-of-the-art VLMs. Ablation studies show that fine-tuning the language backbone and vision-language adapter in concert is key to facilitate adaption while a small percentage, e.g., 20%, of the pre-training data can be used to safeguard against catastrophic forgetting.
```
@inproceedings{FoutterBohjEtAl2024,
  author = {Foutter, M. and Gammelli, D. and Kruger, J. and Foss, E. and Bhoj, P. and Guffanti, T. and D'Amico, S. and Pavone, M.},
  title = {Space-LLaVA: a Vision-Language Model Adapted to Extraterrestrial Applications},
  booktitle = {{IEEE Aerospace Conference}},
  year = {2025},
  asl_address = {Big Sky, Montana},
  asl_url = {https://arxiv.org/abs/2408.05924},
  url = {https://arxiv.org/abs/2408.05924},
  owner = {foutter},
  timestamp = {2025-01-21}
}
```
D. Celestini, A. Afsharrad, D. Gammelli, T. Guffanti, G. Zardini, S. Lall, E. Capelli, S. D’Amico, and M. Pavone, “Generalizable Spacecraft Trajectory Generation via Multimodal Learning with Transformers,” in American Control Conference, 2025.
[BibTeX] [Abstract]

Abstract: Effective trajectory generation is essential for reliable on-board spacecraft autonomy. Among other approaches, learning-based warm-starting represents an appealing paradigm for solving the trajectory generation problem, effectively combining the benefits of optimization- and data-driven methods. Current approaches for learning-based trajectory generation often focus on fixed, single-scenario environments, where key scene characteristics, such as obstacle positions or final-time requirements, remain constant across problem instances. However, practical trajectory generation requires the scenario to be frequently reconfigured, making the single-scenario approach a potentially impractical solution. To address this challenge, we present a novel trajectory generation framework that generalizes across diverse problem configurations, by leveraging high-capacity transformer neural networks capable of learning from multimodal data sources. Specifically, our approach integrates transformer-based neural network models into the trajectory optimization process, encoding both scene-level information (e.g., obstacle locations, initial and goal states) and trajectory-level constraints (e.g., time bounds, fuel consumption targets) via multimodal representations. The transformer network then generates near-optimal initial guesses for non-convex optimization problems, significantly enhancing convergence speed and performance. The framework is validated through extensive simulations and real-world experiments on a free-flyer platform, achieving up to 30% cost improvement and 80% reduction in infeasible cases with respect to traditional approaches, and demonstrating robust generalization across diverse scenario variations.
```
@inproceedings{CelestiniGammelliEtAl2025,
  author = {Celestini, D. and Afsharrad, A. and Gammelli, D. and Guffanti, T. and Zardini, G. and Lall, S. and Capelli, E. and {D'Amico}, S. and Pavone, M.},
  title = {Generalizable Spacecraft Trajectory Generation via Multimodal Learning with Transformers},
  booktitle = {{American Control Conference}},
  year = {2025},
  keywords = {pub},
  owner = {gammelli},
  timestamp = {2024-10-29},
  url = {https://arxiv.org/abs/2410.11723}
}
```
A. Singhal, D. Gammelli, J. Luke, K. Gopalakrishnan, D. Helmreich, and M. Pavone, “Real-time Control of Electric Autonomous Mobility-on-Demand Systems via Graph Reinforcement Learning,” in European Control Conference, Stockholm, Sweden, 2024.
[BibTeX] [Abstract]

Abstract: Operators of Electric Autonomous Mobility-on-Demand (E-AMoD) fleets need to make several real-time decisions such as matching available vehicles to ride requests, rebalancing idle vehicles to areas of high demand, and charging vehicles to ensure sufficient range. While this problem can be posed as a linear program that optimizes flows over a space-charge-time graph, the size of the resulting optimization problem does not allow for real-time implementation in realistic settings. In this work, we present the E-AMoD control problem through the lens of reinforcement learning and propose a graph network-based framework to achieve drastically improved scalability and superior performance over heuristics. Specifically, we adopt a bi-level formulation where we (1) leverage a graph network-based RL agent to specify a desired next state in the space-charge graph, and (2) solve more tractable linear programs to best achieve the desired state while ensuring feasibility. Experiments using real-world data from San Francisco and New York City show that our approach achieves up to 89% of the profits of the theoretically-optimal solution while achieving more than a 100x speedup in computational time. We further highlight promising zero-shot transfer capabilities of our learned policy on tasks such as inter-city generalization and service area expansion, thus showing the utility, scalability, and flexibility of our framework. Finally, our approach outperforms the best domain-specific heuristics with comparable runtimes, with an increase in profits by up to 3.2x.
```
@inproceedings{SinghalGammelliEtAl2024,
  author = {Singhal, A. and Gammelli, D. and Luke, J. and Gopalakrishnan, K. and Helmreich, D. and Pavone, M.},
  title = {Real-time Control of Electric Autonomous Mobility-on-Demand Systems via Graph Reinforcement Learning},
  booktitle = {{European Control Conference}},
  year = {2024},
  address = {Stockholm, Sweden},
  month = jun,
  doi = {10.23919/ecc64448.2024.10591098},
  owner = {jthluke},
  timestamp = {2024-10-28},
  url = {https://arxiv.org/abs/2311.05780}
}
```
T. Guffanti, D. Gammelli, S. D’Amico, and M. Pavone, “Transformers for Trajectory Optimization with Application to Spacecraft Rendezvous,” in IEEE Aerospace Conference, 2024.
[BibTeX] [Abstract]

Abstract: Reliable and efficient trajectory optimization methods are a fundamental need for autonomous dynamical systems, effectively enabling applications including rocket landing, hypersonic reentry, spacecraft rendezvous, and docking. Within such safety-critical application areas, the complexity of the emerging trajectory optimization problems has motivated the application of AI-based techniques to enhance the performance of traditional approaches. However, current AI-based methods either attempt to fully replace traditional control algorithms, thus lacking constraint satisfaction guarantees and incurring in expensive simulation, or aim to solely imitate the behavior of traditional methods via supervised learning. To address these limitations, this paper proposes the Autonomous Rendezvous Transformer (ART) and assesses the capability of modern generative models to solve complex trajectory optimization problems, both from a forecasting and control standpoint. Specifically, this work assesses the capabilities of Transformers to (i) learn near-optimal policies from previously collected data, and (ii) warm-start a sequential optimizer for the solution of non-convex optimal control problems, thus guaranteeing hard constraint satisfaction. From a forecasting perspective, results highlight how ART outperforms other learning-based architectures at predicting known fuel-optimal trajectories. From a control perspective, empirical analyses show how policies learned through Transformers are able to generate near-optimal warmstarts, achieving trajectories that are (i) more fuel-efficient, (ii) obtained in fewer sequential optimizer iterations, and (iii) computed with an overall runtime comparable to benchmarks based on convex optimization.
```
@inproceedings{GuffantiGammelliEtAl2024,
  author = {Guffanti, T. and Gammelli, D. and D'Amico, S. and Pavone, M.},
  title = {Transformers for Trajectory Optimization with Application to Spacecraft Rendezvous},
  booktitle = {{IEEE Aerospace Conference}},
  year = {2024},
  keywords = {pub},
  owner = {gammelli},
  timestamp = {2023-11-15},
  url = {https://arxiv.org/abs/2310.13831}
}
```
D. Celestini, D. Gammelli, T. Guffanti, S. D’Amico, E. Capelli, and M. Pavone, “Transformer-based Model Predictive Control: Trajectory Optimization via Sequence Modeling,” IEEE Robotics and Automation Letters, vol. 9, no. 11, pp. 9280–9827, 2024.
[BibTeX] [Abstract]

Abstract: Model predictive control (MPC) has established itself as the primary methodology for constrained control, enabling general-purpose robot autonomy in diverse real-world scenarios. However, for most problems of interest, MPC relies on the recursive solution of highly non-convex trajectory optimization problems, leading to high computational complexity and strong dependency on initialization. In this work, we present a unified framework to combine the main strengths of optimization-based and learning-based methods for MPC. Our approach entails embedding high-capacity, transformer-based neural network models within the optimization process for trajectory generation, whereby the transformer provides a near-optimal initial guess, or target plan, to a non-convex optimization problem. Our experiments, performed in simulation and the real world onboard a free flyer platform, demonstrate the capabilities of our framework to improve MPC convergence and runtime. Compared to purely optimization-based approaches, results show that our approach can improve trajectory generation performance by up to 75%, reduce the number of solver iterations by up to 45%, and improve overall MPC runtime by 7x without loss in performance.
```
@article{CelestiniGammelliEtAl2024,
  author = {Celestini, D. and Gammelli, D. and Guffanti, T. and D'Amico, S. and Capelli, E. and Pavone, M.},
  title = {Transformer-based Model Predictive Control: Trajectory Optimization via Sequence Modeling},
  journal = {{IEEE Robotics and Automation Letters}},
  year = {2024},
  volume = {9},
  number = {11},
  pages = {9280--9827},
  doi = {10.1109/LRA.2024.3466069},
  owner = {jthluke},
  timestamp = {2024-10-30},
  url = {/wp-content/papercite-data/pdf/Celestini.Gammelli.ea.RAL24.pdf}
}
```
D. Gammelli, J. Harrison, K. Yang, M. Pavone, F. Rodrigues, and F. C. Pereira, “Graph Reinforcement Learning for Network Control via Bi-Level Optimization,” in Int. Conf. on Machine Learning, Honolulu, Hawaii, 2023.
[BibTeX] [Abstract]

Abstract: Optimization problems over dynamic networks have been extensively studied and widely used in the past decades to formulate numerous real-world problems. However, (1) traditional optimization-based approaches do not scale to large networks, and (2) the design of good heuristics or approximation algorithms often requires significant manual trial-and-error. In this work, we argue that data-driven strategies can automate this process and learn efficient algorithms without compromising optimality. To do so, we present network control problems through the lens of reinforcement learning and propose a graph network-based framework to handle a broad class of problems. Instead of naively computing actions over high-dimensional graph elements, e.g., edges, we propose a bi-level formulation where we (1) specify a desired next state via RL, and (2) solve a convex program to best achieve it, leading to drastically improved scalability and performance. We further highlight a collection of desirable features to system designers, investigate design decisions, and present experiments on real-world control problems showing the utility, scalability, and flexibility of our framework.
```
@inproceedings{GammelliHarrisonEtAl2023,
  author = {Gammelli, D. and Harrison, J. and Yang, K. and Pavone, M. and Rodrigues, F. and Pereira, F. C.},
  title = {Graph Reinforcement Learning for Network Control via Bi-Level Optimization},
  booktitle = {{Int. Conf. on Machine Learning}},
  year = {2023},
  address = {Honolulu, Hawaii},
  month = jul,
  owner = {jthluke},
  timestamp = {2024-09-20},
  url = {https://arxiv.org/abs/2305.09129}
}
```
D. Gammelli, K. Yang, J. Harrison, F. Rodrigues, F. Pereira, and M. Pavone, “Graph Meta-Reinforcement Learning for Transferable Autonomous Mobility-on-Demand,” in ACM Int. Conf. on Knowledge Discovery and Data Mining, 2022.
[BibTeX] [Abstract]

Abstract: Autonomous Mobility-on-Demand (AMoD) systems represent an attractive alternative to existing transportation paradigms, currently challenged by urbanization and increasing travel needs. By centrally controlling a fleet of self-driving vehicles, these systems provide mobility service to customers and are currently starting to be deployed in a number of cities around the world. Current learning-based approaches for controlling AMoD systems are limited to the single-city scenario, whereby the service operator is allowed to take an unlimited amount of operational decisions within the same transportation system. However, real-world system operators can hardly afford to fully re-train AMoD controllers for every city they operate in, as this could result in a high number of poor-quality decisions during training, making the single-city strategy a potentially impractical solution. To address these limitations, we propose to formalize the multi-city AMoD problem through the lens of meta-reinforcement learning (meta-RL) and devise an actor-critic algorithm based on recurrent graph neural networks. In our approach, AMoD controllers are explicitly trained such that a small amount of experience within a new city will produce good system performance. Empirically, we show how control policies learned through meta-RL are able to achieve near-optimal performance on unseen cities by learning rapidly adaptable policies, thus making them more robust not only to novel environments, but also to distribution shifts common in real-world operations, such as special events, unexpected congestion, and dynamic pricing schemes.
```
@inproceedings{GammelliYangEtAl2022,
  author = {Gammelli, D. and Yang, K. and Harrison, J. and Rodrigues, F. and Pereira, F. and Pavone, M.},
  booktitle = {{ACM Int. Conf. on Knowledge Discovery and Data Mining}},
  title = {Graph Meta-Reinforcement Learning for Transferable Autonomous Mobility-on-Demand},
  year = {2022},
  keywords = {pub},
  owner = {gammelli},
  url = {https://arxiv.org/abs/2202.07147},
  timestamp = {2022-03-02}
}
```
D. Gammelli, J. Harrison, K. Yang, M. Pavone, F. Rodrigues, and P. C. Francisco, “Graph Reinforcement Learning for Network Control via Bi-Level Optimization,” in Learning on Graphs Conference, 2022.
[BibTeX] [Abstract]

Abstract: Dynamic network flow models have been extensively studied and widely used in the past decades to formulate many problems with great real-world impact, such as transportation, supply chain management, power grid control, and more. Within this context, time-expansion techniques currently represent a generic approach for solving control problems over dynamic networks. However, the complexity of these methods does not allow traditional approaches to scale to large networks, especially when these need to be solved recursively over a receding horizon (e.g., to yield a sequence of actions in model predictive control). Moreover, tractable optimization-based approaches are often limited to simple linear deterministic settings and are not able to handle environments with stochastic, non-linear, or unknown dynamics. In this work, we present dynamic network flow problems through the lens of reinforcement learning and propose a graph network-based framework that can handle a wide variety of problems and learn efficient algorithms without significantly compromising optimality. Instead of a naive and poorly-scalable formulation, in which agent actions (and thus network outputs) consist of actions on edges, we present a two-phase decomposition. The first phase consists of an RL agent specifying desired outcomes to the actions. The second phase exploits the problem structure to solve a convex optimization problem and achieve (as best as possible) these desired outcomes. This formulation leads to dramatically improved scalability and performance. We further highlight a collection of features that are potentially desirable to system designers, investigate design decisions, and present experiments showing the utility, scalability, and flexibility of our framework.
```
@inproceedings{GammelliHarrisonEtAl2022,
  author = {Gammelli, D. and Harrison, J. and Yang, K. and Pavone, M. and Rodrigues, F. and Francisco, Pereira C.},
  booktitle = {{Learning on Graphs Conference}},
  title = {Graph Reinforcement Learning for Network Control via Bi-Level Optimization},
  year = {2022},
  keywords = {pub},
  owner = {gammelli},
  timestamp = {2022-11-24}
}
```
D. Gammelli, K. Yang, J. Harrison, F. Rodrigues, F. C. Pereira, and M. Pavone, “Graph Neural Network Reinforcement Learning for Autonomous Mobility-on-Demand Systems,” in Proc. IEEE Conf. on Decision and Control, 2021.
[BibTeX] [Abstract]

Abstract: Autonomous mobility-on-demand (AMoD) systems represent a rapidly developing mode of transportation wherein travel requests are dynamically handled by a coordinated fleet of robotic, self-driving vehicles. Given a graph representation of the transportation network - one where, for example, nodes represent areas of the city, and edges the connectivity between them - we argue that the AMoD control problem is naturally cast as a node-wise decision-making problem. In this paper, we propose a deep reinforcement learning framework to control the rebalancing of AMoD systems through graph neural networks. Crucially, we demonstrate that graph neural networks enable reinforcement learning agents to recover behavior policies that are significantly more transferable, generalizable, and scalable than policies learned through other approaches. Empirically, we show how the learned policies exhibit promising zero-shot transfer capabilities when faced with critical portability tasks such as inter-city generalization, service area expansion, and adaptation to potentially complex urban topologies.
```
@inproceedings{GammelliYangEtAl2021,
  author = {Gammelli, D. and Yang, K. and Harrison, J. and Rodrigues, F. and Pereira, F. C. and Pavone, M.},
  title = {Graph Neural Network Reinforcement Learning for Autonomous Mobility-on-Demand Systems},
  year = {2021},
  url = {https://arxiv.org/abs/2104.11434},
  owner = {jh2},
  booktitle = {{Proc. IEEE Conf. on Decision and Control}},
  timestamp = {2021-03-23}
}
```

Contacts:

Daniele Gammelli

Awards:

ASL Publications