Amine is a Ph.D. student in the Department of Aeronautics and Astronautics. He recently obtained his M.S. degree from Stanford in 2019. Prior to joining Stanford, Amine graduated summa cum laude with a B.S. in aerospace engineering from UCLA. Before turning to robotics, he has also held multiple internships with the electric propulsion group at JPL.
Amine’s research interests include motion planning, optimal control, machine learning, and robotics. His current work is focused on developing algorithms for safe robotic navigation of unknown or partially occluded environments.
Outside of the lab, Amine enjoys playing piano, soccer, and struggling to lift weights at the gym.
Abstract: Foundation models, e.g., large language models, trained on internet-scale data possess zero-shot generalization capabilities that make them a promising technology for anomaly detection for robotic systems. Fully realizing this promise, however, poses two challenges: (i) mitigating the considerable computational expense of these models such that they may be applied online, and (ii) incorporating their judgement regarding potential anomalies into a safe control framework. In this work we present a two-stage reasoning framework: a fast binary anomaly classifier based on analyzing observations in an LLM embedding space, which may trigger a slower fallback selection stage that utilizes the reasoning capabilities of generative LLMs. These stages correspond to branch points in a model predictive control strategy that maintains the joint feasibility of continuing along various fallback plans as soon as an anomaly is detected (while the selector decides), thus ensuring safety. We demonstrate that, even when instantiated with relatively small language models, our fast anomaly classifier outperforms autoregressive reasoning with state-of-the-art GPT models. This enables our runtime monitor to improve the trustworthiness of dynamic robotic systems under resource and time constraints.
@inproceedings{SinhaElhafsiEtAl2024, author = {Sinha, R. and Elhafsi, A. and Agia, C. and Foutter, M. and Schmerling, E. and Pavone, M.}, title = {Real-Time Anomaly Detection and Planning with Large Language Models}, booktitle = {{Robotics: Science and Systems}}, address = {Delft, Netherlands}, month = jul, year = {2024}, owner = {amine}, url = {https://arxiv.org/abs/2407.08735}, timestamp = {2024-09-19} }
Abstract:
@inproceedings{FoutterBohjEtAl2024, author = {Foutter, M. and Bhoj, P. and Sinha, R. and Elhafsi, A. and Banerjee, S. and Agia, C. and Kruger, J. and Guffanti, T. and Gammelli, D. and D'Amico, S. and Pavone, M.}, title = {Adapting a Foundation Model for Space-based Tasks}, booktitle = {{Robotics: Science and Systems - Workshop on Semantics for Robotics: From Environment Understanding and Reasoning to Safe Interaction}}, year = {2024}, asl_abstract = {Foundation models, e.g., large language models, possess attributes of intelligence which offer promise to endow a robot with the contextual understanding necessary to navigate complex, unstructured tasks in the wild. In the future of space robotics, we see three core challenges which motivate the use of a foundation model adapted to space-based applications: 1) Scalability of ground-in-the-loop operations; 2) Generalizing prior knowledge to novel environments; and 3) Multi-modality in tasks and sensor data. Therefore, as a first-step towards building a foundation model for space-based applications, we automatically label the AI4Mars dataset to curate a language annotated dataset of visual-question-answer tuples. We fine-tune a pretrained LLaVA checkpoint on this dataset to endow a vision-language model with the ability to perform spatial reasoning and navigation on Mars' surface. In this work, we demonstrate that 1) existing vision-language models are deficient visual reasoners in space-based applications, and 2) fine-tuning a vision-language model on extraterrestrial data significantly improves the quality of responses even with a limited training dataset of only a few thousand samples.}, asl_address = {Delft, Netherlands}, asl_url = {https://arxiv.org/abs/2408.05924}, url = {https://arxiv.org/abs/2408.05924}, owner = {foutter}, timestamp = {2024-08-12} }
Abstract: As robots acquire increasingly sophisticated skills and see increasingly complex and varied environments, the threat of an edge case or anomalous failure is ever present. For example, Tesla cars have seen interesting failure modes ranging from autopilot disengagements due to inactive traffic lights carried by trucks to phantom braking caused by images of stop signs on roadside billboards. These system-level failures are not due to failures of any individual component of the autonomy stack but rather system-level deficiencies in semantic reasoning. Such edge cases, which we call semantic anomalies, are simple for a human to disentangle yet require insightful reasoning. To this end, we study the application of large language models (LLMs), endowed with broad contextual understanding and reasoning capabilities, to recognize such edge cases and introduce a monitoring framework for semantic anomaly detection in vision-based policies. Our experiments apply this framework to a finite state machine policy for autonomous driving and a learned policy for object manipulation. These experiments demonstrate that the LLM-based monitor can effectively identify semantic anomalies in a manner that shows agreement with human reasoning. Finally, we provide an extended discussion on the strengths and weaknesses of this approach and motivate a research outlook on how we can further use foundation models for semantic anomaly detection. Our project webpage can be found at https://sites.google.com/view/llm-anomaly-detection.
@article{ElhafsiSinhaEtAl2023, author = {Elhafsi, A. and Sinha, R. and Agia, C. and Schmerling, E. and Nesnas, I. A. D and Pavone, M.}, title = {Semantic Anomaly Detection with Large Language Models}, journal = {{Autonomous Robots}}, volume = {47}, number = {8}, pages = {1035--1055}, year = {2023}, month = oct, doi = {10.1007/s10514-023-10132-6}, url = {https://arxiv.org/abs/2305.11307}, owner = {amine}, timestamp = {2024-09-19} }
Abstract: Reasoning about human motion is a core component of modern human-robot interactive systems. In particular, one of the main uses of behavior prediction in autonomous systems is to inform ego-robot motion planning and control. However, a majority of planning and control algorithms reason about system dynamics rather than the predicted agent tracklets that are commonly output by trajectory forecasting methods, which can hinder their integration. Towards this end, we propose Mixtures of Affine Time-varying Systems (MATS) as an output representation for trajectory forecasting that is more amenable to downstream planning and control use. Our approach leverages successful ideas from probabilistic trajectory forecasting works to learn dynamical system representations that are well-studied in the planning and control literature. We integrate our predictions with a proposed multimodal planning methodology and demonstrate significant computational efficiency improvements on a large-scale autonomous driving dataset.
@inproceedings{IvanovicElhafsiEtAl2020, author = {Ivanovic, B. and Elhafsi, A. and Rosman, G. and Gaidon, A. and Pavone, M.}, title = {{MATS}: An Interpretable Trajectory Forecasting Representation for Planning and Control}, booktitle = {{Conf. on Robot Learning}}, year = {2020}, month = nov, owner = {borisi}, timestamp = {2020-10-14}, url = {https://arxiv.org/abs/2009.07517} }
Abstract: Algorithms for motion planning in unknown environments are generally limited in their ability to reason about the structure of the unobserved environment. As such, current methods generally navigate unknown environments by relying on heuristic methods to choose intermediate objectives along frontiers. We present a unified method that combines map prediction and motion planning for safe, time-efficient autonomous navigation of unknown environments by dynamically-constrained robots. We propose a data-driven method for predicting the map of the unobserved environment, using the robot’s observations of its surroundings as context. These map predictions are then used to plan trajectories from the robot’s position to the goal without requiring frontier selection. We demonstrate that our map-predictive motion planning strategy yields a substantial improvement in trajectory time over a naive frontier pursuit method and demonstrates similar performance to methods using more sophisticated frontier selection heuristics with significantly shorter computation time.
@inproceedings{ElhafsiIvanovicEtAl2020, author = {Elhafsi, A. and Ivanovic, B. and Janson, L. and Pavone, M.}, title = {Map-Predictive Motion Planning in Unknown Environments}, booktitle = {{Proc. IEEE Conf. on Robotics and Automation}}, year = {2020}, address = {Paris, France}, month = jun, url = {https://arxiv.org/abs/1910.08184}, owner = {borisi}, timestamp = {2019-10-21} }
Abstract: Today’s robotic systems are increasingly turning to computationally expensive models such as deep neural networks (DNNs) for tasks like localization, perception, planning, and object detection. However, resource-constrained robots, like low-power drones, often have insufficient on-board compute resources or power reserves to scalably run the most accurate, state-of-the art neural network compute models. Cloud robotics allows mobile robots the benefit of offloading compute to centralized servers if they are uncertain locally or want to run more accurate, compute-intensive models. However, cloud robotics comes with a key, often understated cost: communicating with the cloud over congested wireless networks may result in latency or loss of data. In fact, sending high data-rate video or LIDAR from multiple robots over congested networks can lead to prohibitive delay for real-time applications, which we measure experimentally. In this paper, we formulate a novel Robot Offloading Problem - how and when should robots offload sensing tasks, especially if they are uncertain, to improve accuracy while minimizing the cost of cloud communication? We formulate offloading as a sequential decision making problem for robots, and propose a solution using deep reinforcement learning. In both simulations and hardware experiments using state-of-the art vision DNNs, our offloading strategy improves vision task performance by between 1.3-2.6x of benchmark offloading strategies, allowing robots the potential to significantly transcend their on-board sensing accuracy but with limited cost of cloud communication.
@inproceedings{ChinchaliSharmaEtAl2019, author = {Chinchali, S. and Sharma, A. and Harrison, J. and Elhafsi, A. and Kang, D. and Pergament, E. and Cidon, E. and Katti, S. and Pavone, M.}, title = {Network Offloading Policies for Cloud Robotics: a Learning-based Approach}, booktitle = {{Robotics: Science and Systems}}, year = {2019}, address = {Freiburg im Breisgau, Germany}, month = jun, url = {https://arxiv.org/pdf/1902.05703.pdf}, owner = {apoorva}, timestamp = {2019-02-07} }