Daniele Gammelli

Contacts:

Email: gammelli at stanford dot edu

Daniele Gammelli


Daniele Gammelli is a postdoctoral scholar in Stanford’s Autonomous Systems Lab, where he focuses on developing learning-based solutions that enable the deployment of future autonomous systems in complex environments, with an emphasis on large-scale robotic networks, mobility systems and autonomous spacecraft. He received his Ph.D. in Machine Learning and Mathematical Optimization at the Technical University of Denmark, where he developed ML-based solutions to analyze and control future Intelligent Transportation Systems.

More broadly, his research interests include deep reinforcement learning, generative models, graph neural networks, bayesian statistics, and control techniques leveraging these tools.

Beyond research, Daniele enjoys practicing soccer, going on trail runs, reading, and cooking.

Awards:

  • Kaj and Hermilla Ostenfeld’s Excellence Research Fund

ASL Publications

  1. Y. Takubo, T. Guffanti, D. Gammelli, M. Pavone, and S. D’Amico, “Towards Robust Spacecraft Trajectory Optimization via Transformers,” in IEEE Aerospace Conference, 2025. (Submitted)

    Abstract: Future multi-spacecraft missions require robust autonomous trajectory optimization capabilities to ensure safe and efficient rendezvous operations. This capability hinges on solving non-convex optimal control problems in real time, although traditional iterative methods such as sequential convex programming impose significant computational challenges. To mitigate this burden, the Autonomous Rendezvous Transformer introduced a generative model trained to provide near-optimal initial guesses. This approach provides convergence to better local optima (e.g., fuel optimality), improves feasibility rates, and results in faster convergence speed of optimization algorithms through warm-starting. This work extends the capabilities of ART to address robust chance-constrained optimal control problems. Specifically, ART is applied to challenging rendezvous scenarios in Low Earth Orbit (LEO), ensuring fault-tolerant behavior under uncertainty. Through extensive experimentation, the proposed warm-starting strategy is shown to consistently produce high-quality reference trajectories, achieving up to 30% cost improvement and 50% reduction in infeasible cases compared to conventional methods, demonstrating robust performance across multiple state representations. Additionally, a post hoc evaluation framework is proposed to assess the quality of generated trajectories and mitigate runtime failures, marking an initial step toward the reliable deployment of AI-driven solutions in safety-critical autonomous systems such as spacecraft.

    @inproceedings{TakuboGammelliEtAl2024,
      author = {Takubo, Y. and Guffanti, T. and Gammelli, D. and Pavone, M. and {D'Amico}, S.},
      title = {Towards Robust Spacecraft Trajectory Optimization via Transformers},
      booktitle = {{IEEE Aerospace Conference}},
      year = {2025},
      note = {Submitted},
      keywords = {sub},
      owner = {jthluke},
      timestamp = {2024-10-30},
      url = {https://arxiv.org/abs/2410.05585}
    }
    
  2. C. Schmidt, D. Gammelli, J. Harrison, M. Pavone, and F. Rodrigues, “Offline Hierarchical Reinforcement Learning via Inverse Optimization,” in Int. Conf. on Learning Representations, 2025. (Submitted)

    Abstract: Hierarchical policies enable strong performance in many sequential decision-making problems, such as those with high-dimensional action spaces, those requiring long-horizon planning, and settings with sparse rewards. However, learning hierarchical policies from static offline datasets presents a significant challenge. Crucially, actions taken by higher-level policies may not be directly observable within hierarchical controllers, and the offline dataset might have been generated using a different policy structure, hindering the use of standard offline learning algorithms. In this work, we propose OHIO: a framework for offline reinforcement learning (RL) of hierarchical policies. Our framework leverages knowledge of the policy structure to solve the inverse problem, recovering the unobservable high-level actions that likely generated the observed data under our hierarchical policy. This approach constructs a dataset suitable for off-the-shelf offline training. We demonstrate our framework on robotic and network optimization problems and show that it substantially outperforms end-to-end RL methods and improves robustness. We investigate a variety of instantiations of our framework, both in direct deployment of policies trained offline and when online fine-tuning is performed.

    @inproceedings{SchmidtGammelliEtAl2024,
      author = {Schmidt, C. and Gammelli, D. and Harrison, J. and Pavone, M. and Rodrigues, F.},
      title = {Offline Hierarchical Reinforcement Learning via Inverse Optimization},
      booktitle = {{Int. Conf. on Learning Representations}},
      keywords = {sub},
      note = {Submitted},
      year = {2025},
      owner = {gammelli},
      timestamp = {2024-08-14},
      url = {https://arxiv.org/abs/2410.07933}
    }
    
  3. D. Celestini, A. Afsharrad, D. Gammelli, T. Guffanti, G. Zardini, S. Lall, E. Capelli, S. D’Amico, and M. Pavone, “Generalizable Spacecraft Trajectory Generation via Multimodal Learning with Transformers,” in American Control Conference, 2025. (Submitted)

    Abstract: Effective trajectory generation is essential for reliable on-board spacecraft autonomy. Among other approaches, learning-based warm-starting represents an appealing paradigm for solving the trajectory generation problem, effectively combining the benefits of optimization- and data-driven methods. Current approaches for learning-based trajectory generation often focus on fixed, single-scenario environments, where key scene characteristics, such as obstacle positions or final-time requirements, remain constant across problem instances. However, practical trajectory generation requires the scenario to be frequently reconfigured, making the single-scenario approach a potentially impractical solution. To address this challenge, we present a novel trajectory generation framework that generalizes across diverse problem configurations, by leveraging high-capacity transformer neural networks capable of learning from multimodal data sources. Specifically, our approach integrates transformer-based neural network models into the trajectory optimization process, encoding both scene-level information (e.g., obstacle locations, initial and goal states) and trajectory-level constraints (e.g., time bounds, fuel consumption targets) via multimodal representations. The transformer network then generates near-optimal initial guesses for non-convex optimization problems, significantly enhancing convergence speed and performance. The framework is validated through extensive simulations and real-world experiments on a free-flyer platform, achieving up to 30% cost improvement and 80% reduction in infeasible cases with respect to traditional approaches, and demonstrating robust generalization across diverse scenario variations.

    @inproceedings{CelestiniGammelliEtAl2025,
      author = {Celestini, D. and Afsharrad, A. and Gammelli, D. and Guffanti, T. and Zardini, G. and Lall, S. and Capelli, E. and {D'Amico}, S. and Pavone, M.},
      title = {Generalizable Spacecraft Trajectory Generation via Multimodal Learning with Transformers},
      booktitle = {{American Control Conference}},
      year = {2025},
      note = {Submitted},
      keywords = {sub},
      owner = {gammelli},
      timestamp = {2024-10-29},
      url = {https://arxiv.org/abs/2410.11723}
    }
    
  4. A. Singhal, D. Gammelli, J. Luke, K. Gopalakrishnan, D. Helmreich, and M. Pavone, “Real-time Control of Electric Autonomous Mobility-on-Demand Systems via Graph Reinforcement Learning,” in European Control Conference, Stockholm, Sweden, 2024.

    Abstract: Operators of Electric Autonomous Mobility-on-Demand (E-AMoD) fleets need to make several real-time decisions such as matching available vehicles to ride requests, rebalancing idle vehicles to areas of high demand, and charging vehicles to ensure sufficient range. While this problem can be posed as a linear program that optimizes flows over a space-charge-time graph, the size of the resulting optimization problem does not allow for real-time implementation in realistic settings. In this work, we present the E-AMoD control problem through the lens of reinforcement learning and propose a graph network-based framework to achieve drastically improved scalability and superior performance over heuristics. Specifically, we adopt a bi-level formulation where we (1) leverage a graph network-based RL agent to specify a desired next state in the space-charge graph, and (2) solve more tractable linear programs to best achieve the desired state while ensuring feasibility. Experiments using real-world data from San Francisco and New York City show that our approach achieves up to 89% of the profits of the theoretically-optimal solution while achieving more than a 100x speedup in computational time. We further highlight promising zero-shot transfer capabilities of our learned policy on tasks such as inter-city generalization and service area expansion, thus showing the utility, scalability, and flexibility of our framework. Finally, our approach outperforms the best domain-specific heuristics with comparable runtimes, with an increase in profits by up to 3.2x.

    @inproceedings{SinghalGammelliEtAl2024,
      author = {Singhal, A. and Gammelli, D. and Luke, J. and Gopalakrishnan, K. and Helmreich, D. and Pavone, M.},
      title = {Real-time Control of Electric Autonomous Mobility-on-Demand Systems via Graph Reinforcement Learning},
      booktitle = {{European Control Conference}},
      year = {2024},
      address = {Stockholm, Sweden},
      month = jun,
      doi = {10.23919/ecc64448.2024.10591098},
      owner = {jthluke},
      timestamp = {2024-10-28},
      url = {https://arxiv.org/abs/2311.05780}
    }
    
  5. T. Guffanti, D. Gammelli, S. D’Amico, and M. Pavone, “Transformers for Trajectory Optimization with Application to Spacecraft Rendezvous,” in IEEE Aerospace Conference, 2024.

    Abstract: Reliable and efficient trajectory optimization methods are a fundamental need for autonomous dynamical systems, effectively enabling applications including rocket landing, hypersonic reentry, spacecraft rendezvous, and docking. Within such safety-critical application areas, the complexity of the emerging trajectory optimization problems has motivated the application of AI-based techniques to enhance the performance of traditional approaches. However, current AI-based methods either attempt to fully replace traditional control algorithms, thus lacking constraint satisfaction guarantees and incurring in expensive simulation, or aim to solely imitate the behavior of traditional methods via supervised learning. To address these limitations, this paper proposes the Autonomous Rendezvous Transformer (ART) and assesses the capability of modern generative models to solve complex trajectory optimization problems, both from a forecasting and control standpoint. Specifically, this work assesses the capabilities of Transformers to (i) learn near-optimal policies from previously collected data, and (ii) warm-start a sequential optimizer for the solution of non-convex optimal control problems, thus guaranteeing hard constraint satisfaction. From a forecasting perspective, results highlight how ART outperforms other learning-based architectures at predicting known fuel-optimal trajectories. From a control perspective, empirical analyses show how policies learned through Transformers are able to generate near-optimal warmstarts, achieving trajectories that are (i) more fuel-efficient, (ii) obtained in fewer sequential optimizer iterations, and (iii) computed with an overall runtime comparable to benchmarks based on convex optimization.

    @inproceedings{GuffantiGammelliEtAl2024,
      author = {Guffanti, T. and Gammelli, D. and D'Amico, S. and Pavone, M.},
      title = {Transformers for Trajectory Optimization with Application to Spacecraft Rendezvous},
      booktitle = {{IEEE Aerospace Conference}},
      year = {2024},
      keywords = {pub},
      owner = {gammelli},
      timestamp = {2023-11-15},
      url = {https://arxiv.org/abs/2310.13831}
    }
    
  6. M. Foutter, P. Bhoj, R. Sinha, A. Elhafsi, S. Banerjee, C. Agia, J. Kruger, T. Guffanti, D. Gammelli, S. D’Amico, and M. Pavone, “Adapting a Foundation Model for Space-based Tasks,” in Robotics: Science and Systems - Workshop on Semantics for Robotics: From Environment Understanding and Reasoning to Safe Interaction, 2024.

    Abstract:

    @inproceedings{FoutterBohjEtAl2024,
      author = {Foutter, M. and Bhoj, P. and Sinha, R. and Elhafsi, A. and Banerjee, S. and Agia, C. and Kruger, J. and Guffanti, T. and Gammelli, D. and D'Amico, S. and Pavone, M.},
      title = {Adapting a Foundation Model for Space-based Tasks},
      booktitle = {{Robotics: Science and Systems - Workshop on Semantics for Robotics: From Environment Understanding and Reasoning to Safe Interaction}},
      year = {2024},
      asl_abstract = {Foundation models, e.g., large language models, possess attributes of intelligence which offer promise to endow a robot with the contextual understanding necessary to navigate complex, unstructured tasks in the wild. In the future of space robotics, we see three core challenges which motivate the use of a foundation model adapted to space-based applications: 1) Scalability of ground-in-the-loop operations; 2) Generalizing prior knowledge to novel environments; and 3) Multi-modality in tasks and sensor data. Therefore, as a first-step towards building a foundation model for space-based applications, we automatically label the AI4Mars dataset to curate a language annotated dataset of visual-question-answer tuples. We fine-tune a pretrained LLaVA checkpoint on this dataset to endow a vision-language model with the ability to perform spatial reasoning and navigation on Mars' surface. In this work, we demonstrate that 1) existing vision-language models are deficient visual reasoners in space-based applications, and 2) fine-tuning a vision-language model on extraterrestrial data significantly improves the quality of responses even with a limited training dataset of only a few thousand samples.},
      asl_address = {Delft, Netherlands},
      asl_url = {https://arxiv.org/abs/2408.05924},
      url = {https://arxiv.org/abs/2408.05924},
      owner = {foutter},
      timestamp = {2024-08-12}
    }
    
  7. D. Celestini, D. Gammelli, T. Guffanti, S. D’Amico, E. Capelli, and M. Pavone, “Transformer-based Model Predictive Control: Trajectory Optimization via Sequence Modeling,” IEEE Robotics and Automation Letters, vol. 9, no. 11, pp. 9280–9827, 2024.

    Abstract: Model predictive control (MPC) has established itself as the primary methodology for constrained control, enabling general-purpose robot autonomy in diverse real-world scenarios. However, for most problems of interest, MPC relies on the recursive solution of highly non-convex trajectory optimization problems, leading to high computational complexity and strong dependency on initialization. In this work, we present a unified framework to combine the main strengths of optimization-based and learning-based methods for MPC. Our approach entails embedding high-capacity, transformer-based neural network models within the optimization process for trajectory generation, whereby the transformer provides a near-optimal initial guess, or target plan, to a non-convex optimization problem. Our experiments, performed in simulation and the real world onboard a free flyer platform, demonstrate the capabilities of our framework to improve MPC convergence and runtime. Compared to purely optimization-based approaches, results show that our approach can improve trajectory generation performance by up to 75%, reduce the number of solver iterations by up to 45%, and improve overall MPC runtime by 7x without loss in performance.

    @article{CelestiniGammelliEtAl2024,
      author = {Celestini, D. and Gammelli, D. and Guffanti, T. and D'Amico, S. and Capelli, E. and Pavone, M.},
      title = {Transformer-based Model Predictive Control: Trajectory Optimization via Sequence Modeling},
      journal = {{IEEE Robotics and Automation Letters}},
      year = {2024},
      volume = {9},
      number = {11},
      pages = {9280--9827},
      doi = {10.1109/LRA.2024.3466069},
      owner = {jthluke},
      timestamp = {2024-10-30},
      url = {/wp-content/papercite-data/pdf/Celestini.Gammelli.ea.RAL24.pdf}
    }
    
  8. D. Gammelli, J. Harrison, K. Yang, M. Pavone, F. Rodrigues, and F. C. Pereira, “Graph Reinforcement Learning for Network Control via Bi-Level Optimization,” in Int. Conf. on Machine Learning, Honolulu, Hawaii, 2023.

    Abstract: Optimization problems over dynamic networks have been extensively studied and widely used in the past decades to formulate numerous real-world problems. However, (1) traditional optimization-based approaches do not scale to large networks, and (2) the design of good heuristics or approximation algorithms often requires significant manual trial-and-error. In this work, we argue that data-driven strategies can automate this process and learn efficient algorithms without compromising optimality. To do so, we present network control problems through the lens of reinforcement learning and propose a graph network-based framework to handle a broad class of problems. Instead of naively computing actions over high-dimensional graph elements, e.g., edges, we propose a bi-level formulation where we (1) specify a desired next state via RL, and (2) solve a convex program to best achieve it, leading to drastically improved scalability and performance. We further highlight a collection of desirable features to system designers, investigate design decisions, and present experiments on real-world control problems showing the utility, scalability, and flexibility of our framework.

    @inproceedings{GammelliHarrisonEtAl2023,
      author = {Gammelli, D. and Harrison, J. and Yang, K. and Pavone, M. and Rodrigues, F. and Pereira, F. C.},
      title = {Graph Reinforcement Learning for Network Control via Bi-Level Optimization},
      booktitle = {{Int. Conf. on Machine Learning}},
      year = {2023},
      address = {Honolulu, Hawaii},
      month = jul,
      owner = {jthluke},
      timestamp = {2024-09-20},
      url = {https://arxiv.org/abs/2305.09129}
    }
    
  9. D. Gammelli, K. Yang, J. Harrison, F. Rodrigues, F. Pereira, and M. Pavone, “Graph Meta-Reinforcement Learning for Transferable Autonomous Mobility-on-Demand,” in ACM Int. Conf. on Knowledge Discovery and Data Mining, 2022.

    Abstract: Autonomous Mobility-on-Demand (AMoD) systems represent an attractive alternative to existing transportation paradigms, currently challenged by urbanization and increasing travel needs. By centrally controlling a fleet of self-driving vehicles, these systems provide mobility service to customers and are currently starting to be deployed in a number of cities around the world. Current learning-based approaches for controlling AMoD systems are limited to the single-city scenario, whereby the service operator is allowed to take an unlimited amount of operational decisions within the same transportation system. However, real-world system operators can hardly afford to fully re-train AMoD controllers for every city they operate in, as this could result in a high number of poor-quality decisions during training, making the single-city strategy a potentially impractical solution. To address these limitations, we propose to formalize the multi-city AMoD problem through the lens of meta-reinforcement learning (meta-RL) and devise an actor-critic algorithm based on recurrent graph neural networks. In our approach, AMoD controllers are explicitly trained such that a small amount of experience within a new city will produce good system performance. Empirically, we show how control policies learned through meta-RL are able to achieve near-optimal performance on unseen cities by learning rapidly adaptable policies, thus making them more robust not only to novel environments, but also to distribution shifts common in real-world operations, such as special events, unexpected congestion, and dynamic pricing schemes.

    @inproceedings{GammelliYangEtAl2022,
      author = {Gammelli, D. and Yang, K. and Harrison, J. and Rodrigues, F. and Pereira, F. and Pavone, M.},
      booktitle = {{ACM Int. Conf. on Knowledge Discovery and Data Mining}},
      title = {Graph Meta-Reinforcement Learning for Transferable Autonomous Mobility-on-Demand},
      year = {2022},
      keywords = {pub},
      owner = {gammelli},
      url = {https://arxiv.org/abs/2202.07147},
      timestamp = {2022-03-02}
    }
    
  10. D. Gammelli, J. Harrison, K. Yang, M. Pavone, F. Rodrigues, and P. C. Francisco, “Graph Reinforcement Learning for Network Control via Bi-Level Optimization,” in Learning on Graphs Conference, 2022.

    Abstract: Dynamic network flow models have been extensively studied and widely used in the past decades to formulate many problems with great real-world impact, such as transportation, supply chain management, power grid control, and more. Within this context, time-expansion techniques currently represent a generic approach for solving control problems over dynamic networks. However, the complexity of these methods does not allow traditional approaches to scale to large networks, especially when these need to be solved recursively over a receding horizon (e.g., to yield a sequence of actions in model predictive control). Moreover, tractable optimization-based approaches are often limited to simple linear deterministic settings and are not able to handle environments with stochastic, non-linear, or unknown dynamics. In this work, we present dynamic network flow problems through the lens of reinforcement learning and propose a graph network-based framework that can handle a wide variety of problems and learn efficient algorithms without significantly compromising optimality. Instead of a naive and poorly-scalable formulation, in which agent actions (and thus network outputs) consist of actions on edges, we present a two-phase decomposition. The first phase consists of an RL agent specifying desired outcomes to the actions. The second phase exploits the problem structure to solve a convex optimization problem and achieve (as best as possible) these desired outcomes. This formulation leads to dramatically improved scalability and performance. We further highlight a collection of features that are potentially desirable to system designers, investigate design decisions, and present experiments showing the utility, scalability, and flexibility of our framework.

    @inproceedings{GammelliHarrisonEtAl2022,
      author = {Gammelli, D. and Harrison, J. and Yang, K. and Pavone, M. and Rodrigues, F. and Francisco, Pereira C.},
      booktitle = {{Learning on Graphs Conference}},
      title = {Graph Reinforcement Learning for Network Control via Bi-Level Optimization},
      year = {2022},
      keywords = {pub},
      owner = {gammelli},
      timestamp = {2022-11-24}
    }
    
  11. D. Gammelli, K. Yang, J. Harrison, F. Rodrigues, F. C. Pereira, and M. Pavone, “Graph Neural Network Reinforcement Learning for Autonomous Mobility-on-Demand Systems,” in Proc. IEEE Conf. on Decision and Control, 2021.

    Abstract: Autonomous mobility-on-demand (AMoD) systems represent a rapidly developing mode of transportation wherein travel requests are dynamically handled by a coordinated fleet of robotic, self-driving vehicles. Given a graph representation of the transportation network - one where, for example, nodes represent areas of the city, and edges the connectivity between them - we argue that the AMoD control problem is naturally cast as a node-wise decision-making problem. In this paper, we propose a deep reinforcement learning framework to control the rebalancing of AMoD systems through graph neural networks. Crucially, we demonstrate that graph neural networks enable reinforcement learning agents to recover behavior policies that are significantly more transferable, generalizable, and scalable than policies learned through other approaches. Empirically, we show how the learned policies exhibit promising zero-shot transfer capabilities when faced with critical portability tasks such as inter-city generalization, service area expansion, and adaptation to potentially complex urban topologies.

    @inproceedings{GammelliYangEtAl2021,
      author = {Gammelli, D. and Yang, K. and Harrison, J. and Rodrigues, F. and Pereira, F. C. and Pavone, M.},
      title = {Graph Neural Network Reinforcement Learning for Autonomous Mobility-on-Demand Systems},
      year = {2021},
      url = {https://arxiv.org/abs/2104.11434},
      owner = {jh2},
      booktitle = {{Proc. IEEE Conf. on Decision and Control}},
      timestamp = {2021-03-23}
    }