James Harrison

Contacts:

Personal Webpage

James Harrison


James is a graduate student in the department of Mechanical Engineering. He received a B.Eng. in Mechanical Engineering from McGill University in 2015, and an M.S. in Mechanical Engineering from Stanford University in 2017. James’ research interests include control theory, robotics, and machine learning. In particular, his current work focuses on verifiably safe and robust methods for reinforcement learning, as well as unsupervised learning and representation learning in robotic task and motion planning.

Awards:

  • Office of Technology Licensing Stanford Graduate Fellowship
  • Natural Sciences and Engineering Research Council of Canada (NSERC) Doctoral Scholarship

Currently at Google Brain

ASL Publications

  1. D. Gammelli, Y. K., J. Harrison, R. F., P. F., and M. Pavone, “Graph Meta-Reinforcement Learning for Transferable Autonomous Mobility-on-Demand,” in ACM Int. Conf. on Knowledge Discovery and Data Mining, 2022. (Submitted)

    Abstract: Autonomous Mobility-on-Demand (AMoD) systems represent an attractive alternative to existing transportation paradigms, currently challenged by urbanization and increasing travel needs. By centrally controlling a fleet of self-driving vehicles, these systems provide mobility service to customers and are currently starting to be deployed in a number of cities around the world. Current learning-based approaches for controlling AMoD systems are limited to the single-city scenario, whereby the service operator is allowed to take an unlimited amount of operational decisions within the same transportation system. However, real-world system operators can hardly afford to fully re-train AMoD controllers for every city they operate in, as this could result in a high number of poor-quality decisions during training, making the single-city strategy a potentially impractical solution. To address these limitations, we propose to formalize the multi-city AMoD problem through the lens of meta-reinforcement learning (meta-RL) and devise an actor-critic algorithm based on recurrent graph neural networks. In our approach, AMoD controllers are explicitly trained such that a small amount of experience within a new city will produce good system performance. Empirically, we show how control policies learned through meta-RL are able to achieve near-optimal performance on unseen cities by learning rapidly adaptable policies, thus making them more robust not only to novel environments, but also to distribution shifts common in real-world operations, such as special events, unexpected congestion, and dynamic pricing schemes.

    @inproceedings{GammelliYangEtAl2022,
      author = {Gammelli, D. and K., Yang and Harrison, J. and F., Rodrigues and F., Pereira and Pavone, M.},
      title = {Graph Meta-Reinforcement Learning for Transferable Autonomous Mobility-on-Demand},
      booktitle = {{ACM Int. Conf. on Knowledge Discovery and Data Mining}},
      month = mar,
      year = {2022},
      note = {Submitted},
      keywords = {sub},
      owner = {ykd07},
      timestamp = {2022-03-02}
    }
    
  2. T. Lew, A. Sharma, J. Harrison, A. Bylard, and M. Pavone, “Safe Active Dynamics Learning and Control: A Sequential Exploration-Exploitation Framework,” IEEE Transactions on Robotics, Jan. 2022. (In Press)

    Abstract: To safely deploy learning-based systems in highly uncertain environments, one must ensure that they always satisfy constraints. In this work, we propose a practical and theoretically justified approach to maintaining safety in the presence of dynamics uncertainty. Our approach leverages Bayesian meta-learning with last-layer adaptation: the expressiveness of neural-network features trained offline, paired with efficient last-layer online adaptation, enables the derivation of tight confidence sets which contract around the true dynamics as the model adapts online. We exploit these confidence sets to plan trajectories that guarantee the safety of the system. Our approach handles problems with high dynamics uncertainty where reaching the goal safely is initially infeasible by first exploring to gather data and reduce uncertainty, before autonomously exploiting the acquired information to safely perform the task. Under reasonable assumptions, we prove that our framework provides safety guarantees in the form of a single joint chance constraint. Furthermore, we use this theoretical analysis to motivate regularization of the model to improve performance. We extensively demonstrate our approach in simulation and on hardware.

    @article{LewEtAl2022,
      author = {Lew, T. and Sharma, A. and Harrison, J. and Bylard, A. and Pavone, M.},
      title = {Safe Active Dynamics Learning and Control: A Sequential Exploration-Exploitation Framework},
      journal = {{IEEE Transactions on Robotics}},
      year = {2022},
      note = {In Press},
      month = jan,
      url = {https://arxiv.org/pdf/2008.11700.pdf},
      keywords = {press},
      owner = {lew},
      timestamp = {2022-01-27}
    }
    
  3. R. Sinha, J. Harrison, S. M. Richards, and M. Pavone, “Adaptive Robust Model Predictive Control with Matched and Unmatched Uncertainty,” in American Control Conference, 2022. (In Press)

    Abstract: We propose a learning-based robust predictive control algorithm that compensates for significant uncertainty in the dynamics for a class of discrete-time systems that are nominally linear with an additive nonlinear component. Such systems commonly model the nonlinear effects of an unknown environment on a nominal system. We optimize over a class of nonlinear feedback policies inspired by certainty equivalent “estimate-and-cancel” control laws pioneered in classical adaptive control to achieve significant performance improvements in the presence of uncertainties of large magnitude, a setting in which existing learning-based predictive control algorithms often struggle to guarantee safety. In contrast to previous work in robust adaptive MPC, our approach allows us to take advantage of structure (i.e., the numerical predictions) in the a priori unknown dynamics learned online through function approximation. Our approach also extends typical nonlinear adaptive control methods to systems with state and input constraints even when we cannot directly cancel the additive uncertain function from the dynamics. Moreover, we apply contemporary statistical estimation techniques to certify the system’s safety through persistent constraint satisfaction with high probability. Finally, we show in simulation that our method can accommodate more significant unknown dynamics terms than existing methods.

    @inproceedings{SinhaHarrisonEtAl2022,
      author = {Sinha, R. and Harrison, J. and Richards, S. M. and Pavone, M.},
      title = {Adaptive Robust Model Predictive Control with Matched and Unmatched Uncertainty},
      year = {2022},
      keywords = {press},
      booktitle = {{American Control Conference}},
      url = {https://arxiv.org/pdf/2104.08261.pdf},
      owner = {rhnsinha},
      timestamp = {2022-01-31}
    }
    
  4. J. Harrison, “Uncertainty and Efficiency in Adaptive Robot Learning and Control,” PhD thesis, Stanford University, Dept. of Mechanical Engineering, Stanford, California, 2021.

    Abstract: Autonomous robots have the potential to free humans from dangerous or dull work. To achieve truly autonomous operation, robots must be able to understand unstructured environments and make safe decisions in the face of uncertainty and non-stationarity. As such, robots must be able to learn about, and react to, changing operating conditions or environments continuously, efficiently, and safely. While the last decade has seen rapid advances in the capabilities of machine learning systems driven by deep learning, these systems are limited in their ability to adapt online, learn with small amounts of data, and characterize uncertainty. The desiderata of learning robots therefore directly conflict with the weaknesses of modern deep learning systems. This thesis aims to remedy this conflict and develop robot learning systems that are capable of learning safely and efficiently. In the first part of the thesis we develop tools for efficient learning in changing environments. In particular, we develop tools for the meta-learning problem setting—in which data from a collection of environments may be used to accelerate learning in a new environment—in both the regression and classification setting. These algorithms are based on exact Bayesian inference on meta-learned features. This approach enables characterization of uncertainty in the face of small amounts of within-environment data, and efficient learning via exact conditioning. We extend these approaches to time-varying settings beyond episodic variation, including continuous gradual environmental variation and sharp, changepoint-like variation. In the second part of the thesis we adapt these tools to the problem of robot modeling and control. In particular, we investigate the problem of combining our neural network-based meta-learning models with prior knowledge in the form of a nominal dynamics model, and discuss design decisions to yield better performance and parameter identification. We then develop a strategy for safe learning control. This strategy combines methods from modern constrained control—in particular, robust model predictive control—with ideas from classical adaptive control to yield a computationally efficient, simple to implement, and guaranteed safe control strategy capable of learning online. We conclude the thesis with a discussion of short, intermediate, and long-term next steps in extending the ideas developed herein toward the goal of true robot autonomy.

    @phdthesis{Harrison2021,
      author = {Harrison, J.},
      title = {Uncertainty and Efficiency in Adaptive Robot Learning and Control},
      school = {{Stanford University, Dept. of Mechanical Engineering}},
      year = {2021},
      address = {Stanford, California},
      month = aug,
      url = {https://stacks.stanford.edu/file/druid:hh754jn1534/James_Harrison_Thesis-augmented.pdf},
      owner = {bylard},
      timestamp = {2021-12-06}
    }
    
  5. R. Dyro, J. Harrison, A. Sharma, and M. Pavone, “Particle MPC for Uncertain and Learning-Based Control,” in IEEE/RSJ Int. Conf. on Intelligent Robots & Systems, 2021. (In Press)

    Abstract: As robotic systems move from highly structured environments to open worlds, incorporating uncertainty from dynamics learning or state estimation into the control pipeline is essential for robust performance. In this paper we present a nonlinear particle model predictive control (PMPC) approach to control under uncertainty, which directly incorporates any particle-based uncertainty representation, such as those common in robotics. Our approach builds on scenario methods for MPC, but in contrast to existing approaches, which either constrain all or only the first timestep to share actions across scenarios, we investigate the impact of a partial consensus horizon. Implementing this optimization for nonlinear dynamics by leveraging sequential convex optimization, our approach yields an efficient framework that can be tuned to the particular information gain dynamics of a system to mitigate both over-conservatism and over-optimism. We investigate our approach for two robotic systems across three problem settings: time-varying, partially observed dynamics; sensing uncertainty; and model-based reinforcement learning, and show that our approach improves performance over baselines in all settings.

    @inproceedings{DyroHarrisonEtAl2021,
      author = {Dyro, R. and Harrison, J. and Sharma, A. and Pavone, M.},
      title = {Particle MPC for Uncertain and Learning-Based Control},
      booktitle = {{IEEE/RSJ Int. Conf. on Intelligent Robots \& Systems}},
      year = {2021},
      keywords = {press},
      owner = {rdyro},
      timestamp = {2022-02-05},
      url = {https://arxiv.org/abs/2104.02213}
    }
    
  6. J. Willes, J. Harrison, A. Harakeh, C. Finn, M. Pavone, and S. Waslander, “Open-Set Incremental Learning via Bayesian Prototypical Embeddings,” 2021. (Submitted)

    Abstract:

    @inproceedings{WillesHarrisonEtAl2021,
      author = {Willes, J. and Harrison, J. and Harakeh, A. and Finn, C. and Pavone, M. and Waslander, S.},
      title = {Open-Set Incremental Learning via Bayesian Prototypical Embeddings},
      year = {2021},
      note = {Submitted},
      keywords = {sub},
      owner = {jh2},
      timestamp = {2021-03-23}
    }
    
  7. D. Gammelli, K. Yang, J. Harrison, F. Rodrigues, F. C. Pereira, and M. Pavone, “Graph Neural Network Reinforcement Learning for Autonomous Mobility-on-Demand Systems,” in Proc. IEEE Conf. on Decision and Control, 2021.

    Abstract: Autonomous mobility-on-demand (AMoD) systems represent a rapidly developing mode of transportation wherein travel requests are dynamically handled by a coordinated fleet of robotic, self-driving vehicles. Given a graph representation of the transportation network - one where, for example, nodes represent areas of the city, and edges the connectivity between them - we argue that the AMoD control problem is naturally cast as a node-wise decision-making problem. In this paper, we propose a deep reinforcement learning framework to control the rebalancing of AMoD systems through graph neural networks. Crucially, we demonstrate that graph neural networks enable reinforcement learning agents to recover behavior policies that are significantly more transferable, generalizable, and scalable than policies learned through other approaches. Empirically, we show how the learned policies exhibit promising zero-shot transfer capabilities when faced with critical portability tasks such as inter-city generalization, service area expansion, and adaptation to potentially complex urban topologies.

    @inproceedings{GammelliYangEtAl2021,
      author = {Gammelli, D. and Yang, K. and Harrison, J. and Rodrigues, F. and Pereira, F. C. and Pavone, M.},
      title = {Graph Neural Network Reinforcement Learning for Autonomous Mobility-on-Demand Systems},
      year = {2021},
      url = {https://arxiv.org/abs/2104.11434},
      owner = {jh2},
      booktitle = {{Proc. IEEE Conf. on Decision and Control}},
      timestamp = {2021-03-23}
    }
    
  8. J. Harrison, A. Sharma, C. Finn, and M. Pavone, “Continuous Meta-Learning without Tasks,” in Conf. on Neural Information Processing Systems, 2020.

    Abstract: Meta-learning is a promising strategy for learning to efficiently learn within new tasks, using data gathered from a distribution of tasks. However, the meta-learning literature thus far has focused on the task segmented setting, where at train-time, offline data is assumed to be split according to the underlying task, and at test-time, the algorithms are optimized to learn in a single task. In this work, we enable the application of generic meta-learning algorithms to settings where this task segmentation is unavailable, such as continual online learning with a time-varying task. We present meta-learning via online changepoint analysis (MOCA), an approach which augments a meta-learning algorithm with a differentiable Bayesian changepoint detection scheme. The framework allows both training and testing directly on time series data without segmenting it into discrete tasks. We demonstrate the utility of this approach on a nonlinear meta-regression benchmark as well as two meta-image-classification benchmarks.

    @inproceedings{HarrisonSharmaEtAl2020,
      author = {Harrison, J. and Sharma, A. and Finn, C. and Pavone, M.},
      booktitle = {{Conf. on Neural Information Processing Systems}},
      title = {Continuous Meta-Learning without Tasks},
      year = {2020},
      note = {Submitted},
      month = dec,
      url = {https://arxiv.org/abs/1912.08866},
      owner = {apoorva},
      timestamp = {2020-05-05}
    }
    
  9. S. Banerjee, J. Harrison, P. M. Furlong, and M. Pavone, “Adaptive Meta-Learning for Identification of Rover-Terrain Dynamics,” in Int. Symp. on Artificial Intelligence, Robotics and Automation in Space, Pasadena, California, 2020.

    Abstract: Rovers require knowledge of terrain to plan trajectories that maximize safety and efficiency. Terrain type classification relies on input from human operators or machine learning-based image classification algorithms. However, high level terrain classification is typically not sufficient to prevent incidents such as rovers becoming unexpectedly stuck in a sand trap; in these situations, online rover-terrain interaction data can be leveraged to accurately predict future dynamics and prevent further damage to the rover. This paper presents a meta-learning-based approach to adapt probabilistic predictions of rover dynamics by augmenting a nominal model affine in parameters with a Bayesian regression algorithm (P-ALPaCA). A regularization scheme is introduced to encourage orthogonality of nominal and learned features, leading to interpretable probabilistic estimates of terrain parameters in varying terrain conditions.

    @inproceedings{BanerjeeHarrisonEtAl2020,
      author = {Banerjee, S. and Harrison, J. and Furlong, P. M. and Pavone, M.},
      title = {Adaptive Meta-Learning for Identification of Rover-Terrain Dynamics},
      booktitle = {{Int. Symp. on Artificial Intelligence, Robotics and Automation in Space}},
      year = {2020},
      address = {Pasadena, California},
      month = oct,
      url = {https://arxiv.org/abs/2009.10191},
      owner = {somrita},
      timestamp = {2020-09-18}
    }
    
  10. A. Sharma, J. Harrison, M. Tsao, and M. Pavone, “Robust and Adaptive Planning under Model Uncertainty,” in Int. Conf. on Automated Planning and Scheduling, Berkeley, California, 2019.

    Abstract: Planning under model uncertainty is a fundamental problem across many applications of decision making and learning. In this paper, we propose the Robust Adaptive Monte Carlo Planning (RAMCP) algorithm, which allows computation of risk-sensitive Bayes-adaptive policies that optimally trade off exploration, exploitation, and robustness. RAMCP formulates the risk-sensitive planning problem as a two-player zero-sum game, in which an adversary perturbs the agent’s belief over the models. We introduce two versions of the RAMCP algorithm. The first, RAMCP-F, converges to an optimal risk-sensitive policy without having to rebuild the search tree as the underlying belief over models is perturbed. The second version, RAMCP-I, improves computational efficiency at the cost of losing theoretical guarantees, but is shown to yield empirical results comparable to RAMCP-F. RAMCP is demonstrated on an n-pull multi-armed bandit problem, as well as a patient treatment scenario.

    @inproceedings{SharmaHarrisonEtAl2019,
      author = {Sharma, A. and Harrison, J. and Tsao, M. and Pavone, M.},
      title = {Robust and Adaptive Planning under Model Uncertainty},
      booktitle = {{Int. Conf. on Automated Planning and Scheduling}},
      year = {2019},
      address = {Berkeley, California},
      month = jul,
      url = {https://arxiv.org/pdf/1901.02577.pdf},
      owner = {apoorva},
      timestamp = {2019-04-10}
    }
    
  11. S. Chinchali, A. Sharma, J. Harrison, A. Elhafsi, D. Kang, E. Pergament, E. Cidon, S. Katti, and M. Pavone, “Network Offloading Policies for Cloud Robotics: a Learning-based Approach,” in Robotics: Science and Systems, Freiburg im Breisgau, Germany, 2019.

    Abstract: Today’s robotic systems are increasingly turning to computationally expensive models such as deep neural networks (DNNs) for tasks like localization, perception, planning, and object detection. However, resource-constrained robots, like low-power drones, often have insufficient on-board compute resources or power reserves to scalably run the most accurate, state-of-the art neural network compute models. Cloud robotics allows mobile robots the benefit of offloading compute to centralized servers if they are uncertain locally or want to run more accurate, compute-intensive models. However, cloud robotics comes with a key, often understated cost: communicating with the cloud over congested wireless networks may result in latency or loss of data. In fact, sending high data-rate video or LIDAR from multiple robots over congested networks can lead to prohibitive delay for real-time applications, which we measure experimentally. In this paper, we formulate a novel Robot Offloading Problem - how and when should robots offload sensing tasks, especially if they are uncertain, to improve accuracy while minimizing the cost of cloud communication? We formulate offloading as a sequential decision making problem for robots, and propose a solution using deep reinforcement learning. In both simulations and hardware experiments using state-of-the art vision DNNs, our offloading strategy improves vision task performance by between 1.3-2.6x of benchmark offloading strategies, allowing robots the potential to significantly transcend their on-board sensing accuracy but with limited cost of cloud communication.

    @inproceedings{ChinchaliSharmaEtAl2019,
      author = {Chinchali, S. and Sharma, A. and Harrison, J. and Elhafsi, A. and Kang, D. and Pergament, E. and Cidon, E. and Katti, S. and Pavone, M.},
      title = {Network Offloading Policies for Cloud Robotics: a Learning-based Approach},
      booktitle = {{Robotics: Science and Systems}},
      year = {2019},
      address = {Freiburg im Breisgau, Germany},
      month = jun,
      url = {https://arxiv.org/pdf/1902.05703.pdf},
      owner = {apoorva},
      timestamp = {2019-02-07}
    }
    
  12. B. Ivanovic, J. Harrison, A. Sharma, M. Chen, and M. Pavone, “BaRC: Backward Reachability Curriculum for Robotic Reinforcement Learning,” in Proc. IEEE Conf. on Robotics and Automation, Montreal, Canada, 2019.

    Abstract: Model-free Reinforcement Learning (RL) offers an attractive approach to learn control policies for high-dimensional systems, but its relatively poor sample complexity often forces training in simulated environments. Even in simulation, goal-directed tasks whose natural reward function is sparse remain intractable for state-of-the-art model-free algorithms for continuous control. The bottleneck in these tasks is the prohibitive amount of exploration required to obtain a learning signal from the initial state of the system. In this work, we leverage physical priors in the form of an approximate system dynamics model to design a curriculum scheme for a model-free policy optimization algorithm. Our Backward Reachability Curriculum (BaRC) begins policy training from states that require a small number of actions to accomplish the task, and expands the initial state distribution backwards in a dynamically-consistent manner once the policy optimization algorithm demonstrates sufficient performance. BaRC is general, in that it can accelerate training of any model-free RL algorithm on a broad class of goal-directed continuous control MDPs. Its curriculum strategy is physically intuitive, easy-to-tune, and allows incorporating physical priors to accelerate training without hindering the performance, flexibility, and applicability of the model-free RL algorithm. We evaluate our approach on two representative dynamic robotic learning problems and find substantial performance improvement relative to previous curriculum generation techniques and naïve exploration strategies

    @inproceedings{IvanovicHarrisonEtAl2019,
      author = {Ivanovic, B. and Harrison, J. and Sharma, A. and Chen, M. and Pavone, M.},
      title = {{BaRC:} Backward Reachability Curriculum for Robotic Reinforcement Learning},
      booktitle = {{Proc. IEEE Conf. on Robotics and Automation}},
      year = {2019},
      address = {Montreal, Canada},
      month = may,
      url = {https://arxiv.org/pdf/1806.06161.pdf},
      owner = {borisi},
      timestamp = {2018-09-05}
    }
    
  13. J. Harrison, A. Sharma, and M. Pavone, “Meta-Learning Priors for Efficient Online Bayesian Regression,” in Workshop on Algorithmic Foundations of Robotics, Merida, Mexico, 2018.

    Abstract: Gaussian Process (GP) regression has seen widespread use in robotics due to its generality, simplicity of use, and the utility of Bayesian predictions. In particular, the predominant implementation of GP regression is kernel-based, as it enables fitting of arbitrary nonlinear functions by leveraging kernel functions as infinite-dimensional features. While incorporating prior information has the potential to drastically improve data efficiency of kernel-based GP regression, expressing complex priors through the choice of kernel function and associated hyperparameters is often challenging and unintuitive. Furthermore, the computational complexity of kernel-based GP regression scales poorly with the number of samples, limiting its application in regimes where a large amount of data is available. In this work, we propose ALPaCA, an algorithm for efficient Bayesian regression which addresses these issues. ALPaCA uses a dataset of sample functions to learn a domain-specific, finite-dimensional feature encoding, as well as a prior over the associated weights, such that Bayesian linear regression in this feature space yields accurate online predictions of the posterior density. These features are neural networks, which are trained via a meta-learning approach. ALPaCA extracts all prior information from the dataset, rather than relying on the choice of arbitrary, restrictive kernel hyperparameters. Furthermore, it substantially reduces sample complexity, and allows scaling to large systems. We investigate the performance of ALPaCA on two simple regression problems, two simulated robotic systems, and on a lane-change driving task performed by humans. We find our approach outperforms kernel-based GP regression, as well as state of the art meta-learning approaches, thereby providing a promising plug-in tool for many regression tasks in robotics where scalability and data-efficiency are important.

    @inproceedings{HarrisonSharmaEtAl2018,
      author = {Harrison, J. and Sharma, A. and Pavone, M.},
      title = {Meta-Learning Priors for Efficient Online Bayesian Regression},
      booktitle = {{Workshop on Algorithmic Foundations of Robotics}},
      year = {2018},
      address = {Merida, Mexico},
      month = oct,
      url = {https://arxiv.org/pdf/1807.08912.pdf},
      owner = {apoorva},
      timestamp = {2018-10-07}
    }
    
  14. B. Ichter, J. Harrison, and M. Pavone, “Learning Sampling Distributions for Robot Motion Planning,” in Proc. IEEE Conf. on Robotics and Automation, Brisbane, Australia, 2018.

    Abstract: A defining feature of sampling-based motion planning is the reliance on an implicit representation of the state space, which is enabled by a set of probing samples.Traditionally, these samples are drawn either probabilistically or deterministically to uniformly cover the state space. Yet, the motion of many robotic systems is often restricted to "small" regions of the state space, due to e.g. differential constraints or collision-avoidance constraints. To accelerate the planning process, it is thus desirable to devise non-uniform sampling strategies that favor sampling in those regions where an optimal solution might lie. This paper proposes a methodology for non-uniform sampling, whereby a sampling distribution is learnt from demonstrations, and then used to bias sampling. The sampling distribution is computed through a conditional variational autoencoder, allowing sample generation from the latent space conditioned on the specific planning problem. This methodology is general, can be used in combination with any sampling-based planner, and can effectively exploit the underlying structure of a planning problem while maintaining the theoretical guarantees of sampling-based approaches. Specifically, on several planning problems, the proposed methodology is shown to effectively learn representations for the relevant regions of the state space, resulting in an order of magnitude improvement in terms of success rate and convergence to the optimal cost

    @inproceedings{IchterHarrisonEtAl2018,
      author = {Ichter, B. and Harrison, J. and Pavone, M.},
      title = {Learning Sampling Distributions for Robot Motion Planning},
      booktitle = {{Proc. IEEE Conf. on Robotics and Automation}},
      year = {2018},
      address = {Brisbane, Australia},
      month = may,
      url = {https://arxiv.org/pdf/1709.05448.pdf},
      owner = {frossi2},
      timestamp = {2018-01-16}
    }
    
  15. J. Harrison, A. Garg, B. Ivanovic, Y. Zhu, S. Savarese, F.-F. Li, and M. Pavone, “ADAPT: Zero-Shot Adaptive Policy Transfer for Stochastic Dynamical Systems,” in Int. Symp. on Robotics Research, Puerto Varas, Chile, 2017.

    Abstract: Model-free policy learning has enabled robust performance of complex tasks with relatively simple algorithms. However, this simplicity comes at the cost of requiring an Oracle and arguably very poor sample complexity. This renders such methods unsuitable for physical systems. Variants of model-based methods address this problem through the use of simulators, however, this gives rise to the problem of policy transfer from simulated to the physical system. Model mismatch due to systematic parameter shift and unmodelled dynamics error may cause suboptimal or unsafe behavior upon direct transfer. We introduce the Adaptive Policy Transfer for Stochastic Dynamics (ADAPT) algorithm that achieves provably safe and robust, dynamically-feasible zero-shot transfer of RL-policies to new domains with dynamics error. ADAPT combines the strengths of offline policy learning in a black-box source simulator with online tube-based MPC to attenuate bounded model mismatch between the source and target dynamics. ADAPT allows online transfer of policy, trained solely in a simulation offline, to a family of unknown targets without fine-tuning. We also formally show that (i) ADAPT guarantees state and control safety through state-action tubes under the assumption of Lipschitz continuity of the divergence in dynamics and, (ii) ADAPT results in a bounded loss of reward accumulation in case of direct transfer with ADAPT as compared to a policy trained only on target. We evaluate ADAPT on 2 continuous, non-holonomic simulated dynamical systems with 4 different disturbance models, and find that ADAPT performs between 50%-300% better on mean reward accrual than direct policy transfer.

    @inproceedings{HarrisonGargEtAl2017,
      author = {Harrison, J. and Garg, A. and Ivanovic, B. and Zhu, Y. and Savarese, S. and Li, F.-F. and Pavone, M.},
      title = {{ADAPT:} Zero-Shot Adaptive Policy Transfer for Stochastic Dynamical Systems},
      booktitle = {{Int. Symp. on Robotics Research}},
      year = {2017},
      address = {Puerto Varas, Chile},
      month = dec,
      url = {https://arxiv.org/pdf/1707.04674.pdf},
      owner = {pavone},
      timestamp = {2018-01-16}
    }