Amine Elhafsi

ASL Publications

M. P. Ronecker, M. Foutter, A. Elhafsi, D. Gammelli, I. Barakaiev, M. Pavone, and D. Watzenig, “Vision Foundation Model Embedding-based Semantic Anomaly Detection,” in Proc. IEEE Conf. on Robotics and Automation: Workshop Safe-VLM, 2025.
[BibTeX] [Abstract]

Abstract: Semantic anomalies are contextually invalid or unusual combinations of familiar visual elements that can cause undefined behavior and failures in system-level reasoning for autonomous systems. This work explores semantic anomaly detection by leveraging the semantic priors of state-of-the-art vision foundation models, operating directly on the image. We propose a framework that compares local vision embeddings from runtime images to a database of nominal scenarios in which the autonomous system is deemed safe and performant. In this work, we consider two variants of the proposed framework: one using raw grid-based embeddings, and another leveraging instance segmentation for object-centric representations. To further improve robustness, we introduce a simple filtering mechanism to suppress false positives. Our evaluations on CARLA-simulated anomalies show that the instance-based method with filtering achieves performance comparable to GPT-4o, while providing precise anomaly localization. These results highlight the potential utility of vision embeddings from foundation models for real-time anomaly detection in autonomous systems.
```
@inproceedings{RoneckerFoutterGammelliEtAl2025,
  author = {Ronecker, M. P. and Foutter, M. and Elhafsi, A. and Gammelli, D. and Barakaiev, I. and Pavone, M. and Watzenig, D.},
  title = {Vision Foundation Model Embedding-based Semantic Anomaly Detection},
  booktitle = {{Proc. IEEE Conf. on Robotics and Automation: Workshop Safe-VLM}},
  year = {2025},
  keywords = {pub},
  owner = {gammelli},
  timestamp = {2025-10-23},
  url = {https://arxiv.org/abs/2505.07998}
}
```
Y. Kuang, H. Geng, A. Elhafsi, T. Do, P. Abbeel, J. Malik, M. Pavone, and Y. Wang, “SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending,” CoRL 2024 Workshop on Whole-body Control and Bimanual Manipulation, 2025.
[BibTeX] [Abstract]

Abstract: Humanoid robots hold significant potential in accomplishing daily tasks across diverse environments thanks to their flexibility and human-like morphology. Recent works have made significant progress in humanoid whole-body control and loco-manipulation leveraging optimal control or reinforcement learning. However, these methods require tedious task-specific tuning for each task to achieve satisfactory behaviors, limiting their versatility and scalability to diverse tasks in daily scenarios. To that end, we introduce SkillBlender, a novel hierarchical reinforcement learning framework for versatile humanoid loco-manipulation. SkillBlender first pretrains goal-conditioned task-agnostic primitive skills, and then dynamically blends these skills to accomplish complex loco-manipulation tasks with minimal task-specific reward engineering. We also introduce SkillBench, a parallel, cross-embodiment, and diverse simulated benchmark containing three embodiments, four primitive skills, and eight challenging loco-manipulation tasks, accompanied by a set of scientific evaluation metrics balancing accuracy and feasibility. Extensive simulated experiments show that our method significantly outperforms all baselines, while naturally regularizing behaviors to avoid reward hacking, resulting in more accurate and feasible movements for diverse loco-manipulation tasks in our daily scenarios. Our code and benchmark will be open-sourced to the community to facilitate future research.
```
@article{KuangEtAl2025,
  author = {Kuang, Y. and Geng, H. and Elhafsi, A. and Do, T. and Abbeel, P. and Malik, J. and Pavone, M. and Wang, Y.},
  title = {SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending},
  year = {2025},
  journal = {CoRL 2024 Workshop on Whole-body Control and Bimanual Manipulation},
  url = {https://arxiv.org/abs/2506.09366},
  owner = {amine},
  timestamp = {2025-06-11}
}
```
A. Elhafsi, D. Morton, and M. Pavone, “Scan, Materialize, Simulate: A Generalizable Framework for Physically Grounded Robot Planning,” ArXiv 2505.14938, 2025. (Submitted)
[BibTeX] [Abstract]

Abstract: Autonomous robots must reason about the physical consequences of their actions to operate effectively in unstructured, real-world environments. We present Scan, Materialize, Simulate (SMS), a unified framework that combines 3D Gaussian Splatting for accurate scene reconstruction, visual foundation models for semantic segmentation, vision-language models for material property inference, and physics simulation for reliable prediction of action outcomes. By integrating these components, SMS enables generalizable physical reasoning and object-centric planning without the need to re-learn foundational physical dynamics. We empirically validate SMS in a billiards-inspired manipulation task and a challenging quadrotor landing scenario, demonstrating robust performance on both simulated domain transfer and real-world experiments. Our results highlight the potential of bridging differentiable rendering for scene reconstruction, foundation models for semantic understanding, and physics-based simulation to achieve physically grounded robot planning across diverse settings.
```
@article{ElhafsiMortonPavone2025,
  author = {Elhafsi, A. and Morton, D. and Pavone, M.},
  title = {Scan, Materialize, Simulate: A Generalizable Framework for Physically Grounded Robot Planning},
  year = {2025},
  journal = {ArXiv 2505.14938},
  url = {https://arxiv.org/pdf/2505.14938},
  keywords = {sub},
  owner = {amine},
  timestamp = {2025-05-20}
}
```
R. Sinha, A. Elhafsi, C. Agia, M. Foutter, E. Schmerling, and M. Pavone, “Real-Time Anomaly Detection and Planning with Large Language Models,” in Robotics: Science and Systems, Delft, Netherlands, 2024.
[BibTeX] [Abstract]

Abstract: Foundation models, e.g., large language models, trained on internet-scale data possess zero-shot generalization capabilities that make them a promising technology for anomaly detection for robotic systems. Fully realizing this promise, however, poses two challenges: (i) mitigating the considerable computational expense of these models such that they may be applied online, and (ii) incorporating their judgement regarding potential anomalies into a safe control framework. In this work we present a two-stage reasoning framework: a fast binary anomaly classifier based on analyzing observations in an LLM embedding space, which may trigger a slower fallback selection stage that utilizes the reasoning capabilities of generative LLMs. These stages correspond to branch points in a model predictive control strategy that maintains the joint feasibility of continuing along various fallback plans as soon as an anomaly is detected (while the selector decides), thus ensuring safety. We demonstrate that, even when instantiated with relatively small language models, our fast anomaly classifier outperforms autoregressive reasoning with state-of-the-art GPT models. This enables our runtime monitor to improve the trustworthiness of dynamic robotic systems under resource and time constraints.
```
@inproceedings{SinhaElhafsiEtAl2024,
  author = {Sinha, R. and Elhafsi, A. and Agia, C. and Foutter, M. and Schmerling, E. and Pavone, M.},
  title = {Real-Time Anomaly Detection and Planning with Large Language Models},
  booktitle = {{Robotics: Science and Systems}},
  address = {Delft, Netherlands},
  month = jul,
  year = {2024},
  owner = {amine},
  url = {https://arxiv.org/abs/2407.08735},
  timestamp = {2024-09-19},
  note = {Best Paper Award}
}
```
A. Elhafsi, R. Sinha, C. Agia, E. Schmerling, I. A. D. Nesnas, and M. Pavone, “Semantic Anomaly Detection with Large Language Models,” Autonomous Robots, vol. 47, no. 8, pp. 1035–1055, Oct. 2023.
[BibTeX] [Abstract]

Abstract: As robots acquire increasingly sophisticated skills and see increasingly complex and varied environments, the threat of an edge case or anomalous failure is ever present. For example, Tesla cars have seen interesting failure modes ranging from autopilot disengagements due to inactive traffic lights carried by trucks to phantom braking caused by images of stop signs on roadside billboards. These system-level failures are not due to failures of any individual component of the autonomy stack but rather system-level deficiencies in semantic reasoning. Such edge cases, which we call semantic anomalies, are simple for a human to disentangle yet require insightful reasoning. To this end, we study the application of large language models (LLMs), endowed with broad contextual understanding and reasoning capabilities, to recognize such edge cases and introduce a monitoring framework for semantic anomaly detection in vision-based policies. Our experiments apply this framework to a finite state machine policy for autonomous driving and a learned policy for object manipulation. These experiments demonstrate that the LLM-based monitor can effectively identify semantic anomalies in a manner that shows agreement with human reasoning. Finally, we provide an extended discussion on the strengths and weaknesses of this approach and motivate a research outlook on how we can further use foundation models for semantic anomaly detection. Our project webpage can be found at https://sites.google.com/view/llm-anomaly-detection.
```
@article{ElhafsiSinhaEtAl2023,
  author = {Elhafsi, A. and Sinha, R. and Agia, C. and Schmerling, E. and Nesnas, I. A. D and Pavone, M.},
  title = {Semantic Anomaly Detection with Large Language Models},
  journal = {{Autonomous Robots}},
  volume = {47},
  number = {8},
  pages = {1035--1055},
  year = {2023},
  month = oct,
  doi = {10.1007/s10514-023-10132-6},
  url = {https://arxiv.org/abs/2305.11307},
  owner = {amine},
  timestamp = {2024-09-19}
}
```
B. Ivanovic, A. Elhafsi, G. Rosman, A. Gaidon, and M. Pavone, “MATS: An Interpretable Trajectory Forecasting Representation for Planning and Control,” in Conf. on Robot Learning, 2020.
[BibTeX] [Abstract]

Abstract: Reasoning about human motion is a core component of modern human-robot interactive systems. In particular, one of the main uses of behavior prediction in autonomous systems is to inform ego-robot motion planning and control. However, a majority of planning and control algorithms reason about system dynamics rather than the predicted agent tracklets that are commonly output by trajectory forecasting methods, which can hinder their integration. Towards this end, we propose Mixtures of Affine Time-varying Systems (MATS) as an output representation for trajectory forecasting that is more amenable to downstream planning and control use. Our approach leverages successful ideas from probabilistic trajectory forecasting works to learn dynamical system representations that are well-studied in the planning and control literature. We integrate our predictions with a proposed multimodal planning methodology and demonstrate significant computational efficiency improvements on a large-scale autonomous driving dataset.
```
@inproceedings{IvanovicElhafsiEtAl2020,
  author = {Ivanovic, B. and Elhafsi, A. and Rosman, G. and Gaidon, A. and Pavone, M.},
  title = {{MATS}: An Interpretable Trajectory Forecasting Representation for Planning and Control},
  booktitle = {{Conf. on Robot Learning}},
  year = {2020},
  month = nov,
  owner = {borisi},
  timestamp = {2020-10-14},
  url = {https://arxiv.org/abs/2009.07517}
}
```
A. Elhafsi, B. Ivanovic, L. Janson, and M. Pavone, “Map-Predictive Motion Planning in Unknown Environments,” in Proc. IEEE Conf. on Robotics and Automation, Paris, France, 2020.
[BibTeX] [Abstract]

Abstract: Algorithms for motion planning in unknown environments are generally limited in their ability to reason about the structure of the unobserved environment. As such, current methods generally navigate unknown environments by relying on heuristic methods to choose intermediate objectives along frontiers. We present a unified method that combines map prediction and motion planning for safe, time-efficient autonomous navigation of unknown environments by dynamically-constrained robots. We propose a data-driven method for predicting the map of the unobserved environment, using the robot’s observations of its surroundings as context. These map predictions are then used to plan trajectories from the robot’s position to the goal without requiring frontier selection. We demonstrate that our map-predictive motion planning strategy yields a substantial improvement in trajectory time over a naive frontier pursuit method and demonstrates similar performance to methods using more sophisticated frontier selection heuristics with significantly shorter computation time.
```
@inproceedings{ElhafsiIvanovicEtAl2020,
  author = {Elhafsi, A. and Ivanovic, B. and Janson, L. and Pavone, M.},
  title = {Map-Predictive Motion Planning in Unknown Environments},
  booktitle = {{Proc. IEEE Conf. on Robotics and Automation}},
  year = {2020},
  address = {Paris, France},
  month = jun,
  url = {https://arxiv.org/abs/1910.08184},
  owner = {borisi},
  timestamp = {2019-10-21}
}
```
S. Chinchali, A. Sharma, J. Harrison, A. Elhafsi, D. Kang, E. Pergament, E. Cidon, S. Katti, and M. Pavone, “Network Offloading Policies for Cloud Robotics: a Learning-based Approach,” in Robotics: Science and Systems, Freiburg im Breisgau, Germany, 2019.
[BibTeX] [Abstract]

Abstract: Today’s robotic systems are increasingly turning to computationally expensive models such as deep neural networks (DNNs) for tasks like localization, perception, planning, and object detection. However, resource-constrained robots, like low-power drones, often have insufficient on-board compute resources or power reserves to scalably run the most accurate, state-of-the art neural network compute models. Cloud robotics allows mobile robots the benefit of offloading compute to centralized servers if they are uncertain locally or want to run more accurate, compute-intensive models. However, cloud robotics comes with a key, often understated cost: communicating with the cloud over congested wireless networks may result in latency or loss of data. In fact, sending high data-rate video or LIDAR from multiple robots over congested networks can lead to prohibitive delay for real-time applications, which we measure experimentally. In this paper, we formulate a novel Robot Offloading Problem - how and when should robots offload sensing tasks, especially if they are uncertain, to improve accuracy while minimizing the cost of cloud communication? We formulate offloading as a sequential decision making problem for robots, and propose a solution using deep reinforcement learning. In both simulations and hardware experiments using state-of-the art vision DNNs, our offloading strategy improves vision task performance by between 1.3-2.6x of benchmark offloading strategies, allowing robots the potential to significantly transcend their on-board sensing accuracy but with limited cost of cloud communication.
```
@inproceedings{ChinchaliSharmaEtAl2019,
  author = {Chinchali, S. and Sharma, A. and Harrison, J. and Elhafsi, A. and Kang, D. and Pergament, E. and Cidon, E. and Katti, S. and Pavone, M.},
  title = {Network Offloading Policies for Cloud Robotics: a Learning-based Approach},
  booktitle = {{Robotics: Science and Systems}},
  year = {2019},
  address = {Freiburg im Breisgau, Germany},
  month = jun,
  url = {https://arxiv.org/pdf/1902.05703.pdf},
  owner = {apoorva},
  timestamp = {2019-02-07}
}
```

Contacts:

Amine Elhafsi

Awards:

ASL Publications