Yinlam Chow earned his Ph.D. in Computational and Mathematical engineering from Stanford University in 2017. Before that, he received a B.Eng. in Mechanical Engineering from the University of Hong Kong in 2009 and an MSE in Aerospace Engineering from Purdue University in 2011.
Abstract: We study risk-sensitive imitation learning where the agent’s goal is to perform at least as well as the expert in terms of a risk profile. We first formulate our risk-sensitive imitation learning setting. We consider the generative adversarial approach to imitation learning (GAIL) and derive an optimization problem for our formulation, which we call it risk-sensitive GAIL (RS-GAIL). We then derive two different versions of our RS-GAIL optimization problem that aim at matching the risk profiles of the agent and the expert w.r.t. Jensen-Shannon (JS) divergence and Wasserstein distance, and develop risk-sensitive generative adversarial imitation learning algorithms based on these optimization problems. We evaluate the performance of our algorithms and compare them with GAIL and the risk-averse imitation learning (RAIL) algorithms in two MuJoCo and two OpenAI classical control tasks.
@inproceedings{LacotteGhavamzadehEtAl2019, author = {Lacotte, J. and Ghavamzadeh, M. and Chow, Y. and Pavone, M.}, title = {Risk-Sensitive Generative Adversarial Imitation Learning}, booktitle = {{Int. Conf. on Artificial Intelligence and Statistics}}, year = {2019}, address = {Okinawa, Japan}, month = dec, owner = {lacotte}, timestamp = {2021-11-21}, url = {https://arxiv.org/abs/1808.04468} }
Abstract: In many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account risk, i.e., increased awareness of events of small probability and high consequences. Accordingly, the objective of this paper is to present efficient reinforcement learning algorithms for risk-constrained Markov decision processes (MDPs), where risk is represented via a chance constraint or a constraint on the conditional value-at-risk (CVaR) of the cumulative cost. We collectively refer to such problems as percentile risk-constrained MDPs. Specifically, we first derive a formula for computing the gradient of the Lagrangian function for percentile risk-constrained MDPs. Then, we devise policy gradient and actor-critic algorithms that (1) estimate such gradient, (2) update the policy parameters in the descent direction, and (3) update the Lagrange multiplier in the ascent direction. For these algorithms we prove convergence to locally-optimal policies. Finally, we demonstrate the effectiveness of our algorithms in an optimal stopping problem and an online marketing application.
@article{ChowGhavamzadehEtAl2018, author = {Chow, Y. and Ghavamzadeh, M. and Janson, L. and Pavone, M.}, title = {Risk-Constrained Reinforcement Learning with Percentile Risk Criteria}, journal = {{Journal of Machine Learning Research}}, year = {2018}, url = {/wp-content/papercite-data/pdf/Chow.Ghavamzadeh.Janson.Pavone.JMLR18.pdf}, owner = {bylard}, timestamp = {2018-06-03} }
Abstract: In this paper we present a framework for risk-sensitive model predictive control (MPC) of linear systems affected by stochastic multiplicative uncertainty. Our key innovation is to consider a time-consistent, dynamic risk evaluation of the cumulative cost as the objective function to be minimized. This framework is axiomatically justified in terms of time-consistency of risk assessments, is amenable to dynamic optimization, and is unifying in the sense that it captures a full range of risk preferences from risk-neutral (i.e., expectation) to worst case. Within this framework, we propose and analyze an online risk-sensitive MPC algorithm that is provably stabilizing. Furthermore, by exploiting the dual representation of time-consistent, dynamic risk measures, we cast the computation of the MPC control law as a convex optimization problem amenable to real-time implementation. Simulation results are presented and discussed.
@unpublished{SinghChowEtAl2018, author = {Singh, S. and Chow, Y.-L. and Majumdar, A. and Pavone, M.}, title = {A Framework for Time-Consistent, Risk-Sensitive Model Predictive Control: Theory and Algorithms}, note = {{Available at }\url{http://arxiv.org/abs/1703.01029}}, year = {2018}, url = {http://arxiv.org/pdf/1703.01029.pdf}, owner = {ssingh19}, timestamp = {2018-06-30} }
Abstract: In this paper we present a framework for risk-sensitive model predictive control (MPC) of linear systems affected by stochastic multiplicative uncertainty. Our key innovation is to consider a time-consistent, dynamic risk evaluation of the cumulative cost as the objective function to be minimized. This framework is axiomatically justified in terms of time-consistency of risk assessments, is amenable to dynamic optimization, and is unifying in the sense that it captures a full range of risk preferences from risk-neutral (i.e., expectation) to worst case. Within this framework, we propose and analyze an online risk-sensitive MPC algorithm that is provably stabilizing. Furthermore, by exploiting the dual representation of time-consistent, dynamic risk measures, we cast the computation of the MPC control law as a convex optimization problem amenable to real-time implementation. Simulation results are presented and discussed.
@article{SinghChowEtAl2018b, author = {Singh, S. and Chow, Y.-L. and Majumdar, A. and Pavone, M.}, title = {A Framework for Time-Consistent, Risk-Sensitive Model Predictive Control: Theory and Algorithms}, journal = {{IEEE Transactions on Automatic Control}}, volume = {64}, number = {7}, pages = {2905--2912}, year = {2018}, note = {{Extended version available at:} \url{http://arxiv.org/abs/1703.01029}}, url = {http://arxiv.org/pdf/1703.01029.pdf}, owner = {ssingh19}, timestamp = {2019-07-29} }
Abstract: Markov decision processes (MDPs) provide a mathematical framework for modeling sequential decision making where system evolution and cost/reward depend on uncertainties and control actions of a decision. MDP models have been widely adopted in numerous domains such as robotics, control systems, finance, economics, and manufacturing. At the same time, optimization theories of MDPs serve as the theoretical underpinnings to numerous dynamic programming and reinforcement learning algorithms in stochastic control problems. While the study in MDPs is attractive for several reasons, there are two main challenges associated with its practicality: (1) An accurate MDP model is oftentimes not available to the decision maker. Affected by modeling errors, the resultant MDP solution policy is non-robust to system fluctuations. (2) The most widely-adopted optimization criterion for MDPs is represented by the risk-neutral expectation of a cumulative cost. This does not take into account the notion of risk, i.e., increased awareness of events of small probability but high consequences. In this thesis we study multiple important aspects in risk-sensitive sequential decision making where the variability of stochastic costs and robustness to modeling errors are taken into account. First, we address a special type of risk-sensitive decision making problems where the percentile behaviors are considered. Here risk is either modeled by the conditional value-at-risk (CVaR) or the Value-at-risk (VaR). VaR measures risk as the maximum cost that might be incurred with respect to a given confidence level, and is appealing due to its intuitive meaning and its connection to chance-constraints. The VaR risk measure has many fundamental engineering applications such as motion planning, where a safety constraint is imposed to upper bound the probability of maneuvering into dangerous regimes. Despite its popularity, VaR suffers from being unstable, and its singularity often introduces mathematical issues to optimization problems. To alleviate this problem, an alternative measure that addresses most of VaR?s shortcomings is CVaR. CVaR is a risk-measure that is rapidly gaining popularity in various financial applications, due to its favorable computational properties (i.e., CVaR is a coherent risk) and superior ability to safeguard a decision maker from the "outcomes that hurt the most". As a risk that measures the conditional expected cost given that such cost is greater than or equal to VaR, CVaR accounts for the total cost of undesirable events (it corresponds to events whose associated probability is low, but the corresponding cost is high) and is therefore preferable in financial application ssuch as portfolio optimization. Second, we consider optimization problems in which the objective function involves a coherent risk measure of the random cost. Here the term coherent risk [7] denotes a general class of risks that satisfies convexity, monotonicity, translational-invariance and positive homogeneity. These properties not only guarantee that the optimization problems are mathematically well-posed, but they are also axiomatically justified. Therefore modeling risk-aversion with coherent risks has already gained widespread acceptance in engineering, finance and operations research applications, among others. On the other hand, when the optimization problem is sequential, another important property of a risk measure is time consistency. A time consistent risk metric satisfies the "dynamic-programming" style property which ensures rational decision making, i.e., the strategy that is risk-optimal at the current stage will also be deemed optimal in subsequent stages. To get the best of both worlds, the recently proposed Markov risk measures [119] satisfy both the coherent risk properties and time consistency. Thus to ensure rationality in risk modeling and algorithmic tractability, this thesis will focus on risk-sensitive sequential decision making problems modeled by Markov risk measures.
@phdthesis{Chow2017, author = {Chow, Y.}, title = {Risk-Sensitive and Data-Driven Sequential Decision Making}, school = {{Stanford University, Dept. of Aeronautics and Astronautics}}, year = {2017}, address = {Stanford, California}, month = mar, url = {/wp-content/papercite-data/pdf/Chow.PhD17.pdf}, owner = {frossi2}, timestamp = {2018-03-19} }
Abstract: In this paper we present an algorithm to compute risk averse policies in Markov Decision Processes (MDP) when the total cost criterion is used together with the average value at risk (AVaR) metric. Risk averse policies are needed when large deviations from the expected behavior may have detrimental effects, and conventional MDP algorithms usually ignore this aspect. We provide conditions for the structure of the underlying MDP ensuring that approximations for the exact problem can be derived and solved efficiently. Our findings are novel inasmuch as average value at risk has not previously been considered in association with the total cost criterion. Our method is demonstrated in a rapid deployment scenario, whereby a robot is tasked with the objective of reaching a target location within a temporal deadline where increased speed is associated with increased probability of failure. We demonstrate that the proposed algorithm not only produces a risk averse policy reducing the probability of exceeding the expected temporal deadline, but also provides the statistical distribution of costs, thus offering a valuable analysis tool.
@inproceedings{CarpinChowEtAl2016, author = {Carpin, S. and Chow, Y. and Pavone, M.}, title = {Risk Aversion in Finite {Markov} {Decision} {Processes} Using Total Cost Criteria and Average Value at Risk}, booktitle = {{Proc. IEEE Conf. on Robotics and Automation}}, year = {2016}, address = {Stockholm, Sweden}, doi = {10.1109/ICRA.2016.7487152}, month = may, url = {/wp-content/papercite-data/pdf/Carpin.Chow.Pavone.ICRA16.pdf}, owner = {bylard}, timestamp = {2017-01-28} }
Abstract: In this paper we address the problem of decision making within a Markov decision process (MDP) framework where risk and modeling errors are taken into account. Our approach is to minimize a risk-sensitive conditional-value-at-risk (CVaR) objective, as opposed to a standard risk-neutral expectation. We refer to such problem as CVaR MDP. Our first contribution is to show that a CVaR objective, besides capturing risk sensitivity, has an alternative interpretation as expected cost under worst-case modeling errors, for a given error budget. This result, which is of independent interest, motivates CVaR MDPs as a unifying framework for risk-sensitive and robust decision making. Our second contribution is to present a value-iteration algorithm for CVaR MDPs, and analyze its convergence rate. To our knowledge, this is the first solution algorithm for CVaR MDPs that enjoys error guarantees. Finally, we present results from numerical experiments that corroborate our theoretical findings and show the practicality of our approach.
@inproceedings{ChowTamarEtAl2015, author = {Chow, Y. and Tamar, A. and Mannor, S. and Pavone, M.}, title = {Risk-Sensitive and Robust Decision-Making: a {CVaR} Optimization Approach}, booktitle = {{Conf. on Neural Information Processing Systems}}, year = {2015}, address = {Montreal, Canada}, url = {/wp-content/papercite-data/pdf/Chow.Tamar.Mannor.Pavone.NIPS15.pdf}, owner = {bylard}, timestamp = {2017-01-28} }
Abstract: In this paper we present a framework for risk-averse model predictive control (MPC) of linear systems affected by multiplicative uncertainty. Our key innovation is to consider time-consistent, dynamic risk metrics as objective functions to be minimized. This framework is axiomatically justified in terms of time-consistency of risk preferences, is amenable to dynamic optimization, and is unifying in the sense that it captures a full range of risk assessments from risk-neutral to worst case. Within this framework, we propose and analyze an online risk-averse MPC algorithm that is provably stabilizing. Furthermore, by exploiting the dual representation of time-consistent, dynamic risk metrics, we cast the computation of the MPC control law as a convex optimization problem amenable to implementation on embedded systems. Simulation results are presented and discussed.
@inproceedings{ChowPavone2014, author = {Chow, Y. and Pavone, M.}, title = {A Framework for Time-Consistent, Risk-Averse Model Predictive Control: Theory and Algorithms}, booktitle = {{American Control Conference}}, year = {2014}, address = {Portland, Oregon}, doi = {10.1109/ACC.2014.6859437}, month = jun, url = {/wp-content/papercite-data/pdf/Chow.Pavone.ACC14.pdf}, owner = {bylard}, timestamp = {2017-01-28} }
Abstract: In this paper we consider a stochastic deployment problem, where a robotic swarm is tasked with the objective of positioning at least one robot at each of a set of pre-assigned targets while meeting a temporal deadline. Travel times and failure rates are stochastic but related, inasmuch as failure rates increase with speed. To maximize chances of success while meeting the deadline, a control strategy has therefore to balance safety and performance. Our approach is to cast the problem within the theory of constrained Markov Decision Processes, whereby we seek to compute policies that maximize the probability of successful deployment while ensuring that the expected duration of the task is bounded by a given deadline. To account for uncertainties in the problem parameters, we consider a robust formulation and we propose efficient solution algorithms, which are of independent interest. Numerical experiments confirming our theoretical results are presented and discussed.
@article{ChowPavoneEtAl2014, author = {Chow, Y. and Pavone, M. and Sadler, B. M. and Carpin, S.}, title = {Trading Safety Versus Performance: Rapid Deployment of Robotic Swarms with Robust Performance Constraints}, journal = {{ASME Journal of Dynamic Systems, Measurement, and Control}}, volume = {137}, number = {3}, pages = {031005.1--031005.11}, year = {2014}, doi = {10.1115/1.4028117}, owner = {bylard}, timestamp = {2017-01-28}, url = {http://web.stanford.edu/~pavone/papers/Chow.Pavone.ea.ASME14.pdf} }
Abstract: In this paper, we present a discretization algorithm for the solution of stochastic optimal control problems with dynamic, time-consistent risk constraints. Previous works have shown that such problems can be cast as Markov decision problems (MDPs) on an augmented state space where a constrained form of Bellman’s recursion can be applied. However, even if both the state space and action spaces for the original optimization problem are finite, the augmented state in the induced MDP problem contains state variables that are continuous. Our approach is to apply a uniform-grid discretization scheme for the augmented state. To prove the correctness of this approach, we develop novel Lipschitz bounds for constrained dynamic programming operators. We show that convergence to the optimal value functions is linear in the step size, which is the same convergence rate for discretization algorithms for unconstrained dynamic programming operators. Simulation experiments are presented and discussed.
@inproceedings{ChowPavone2013b, author = {Chow, Y. and Pavone, M.}, title = {A Uniform-Grid Discretization Algorithm for Stochastic Optimal Control with Risk Constraints}, booktitle = {{Proc. IEEE Conf. on Decision and Control}}, year = {2013}, address = {Firenze, Italy}, doi = {10.1109/CDC.2013.6760250}, month = dec, url = {/wp-content/papercite-data/pdf/Chow.Pavone.CDC13.pdf}, owner = {bylard}, timestamp = {2017-01-28} }
Abstract: In this paper we present a dynamic programing approach to stochastic optimal control problems with dynamic, time-consistent risk constraints. Constrained stochastic optimal control problems, which naturally arise when one has to consider multiple objectives, have been extensively investigated in the past 20 years; however, in most formulations, the constraints are formulated as either risk-neutral (i.e., by considering an expected cost), or by applying static, singleperiod risk metrics with limited attention to "time-consistency" (i.e., to whether such metrics ensure rational consistency of risk preferences across multiple periods). Recently, significant strides have been made in the development of a rigorous theory of dynamic, time-consistent risk metrics for multi-period (risk-sensitive) decision processes; however, their integration within constrained stochastic optimal control problems has received little attention. The goal of this paper is to bridge this gap. First, we formulate the stochastic optimal control problem with dynamic, time-consistent risk constraints and we characterize the tail subproblems (which requires the addition of a Markovian structure to the risk metrics). Second, we develop a dynamic programming approach for its solution, which allows to compute the optimal costs by value iteration. Finally, we discuss both theoretical and practical features of our approach, such as generalizations, construction of optimal control policies, and computational aspects. A simple, two-state example is given to illustrate the problem setup and the solution approach.
@inproceedings{ChowPavone2013, author = {Chow, Y. and Pavone, M.}, title = {Stochastic Optimal Control with Dynamic, Time-Consistent Risk Constraints}, booktitle = {{American Control Conference}}, year = {2013}, address = {Washington, D.C.}, doi = {10.1109/ACC.2013.6579868}, month = jun, url = {/wp-content/papercite-data/pdf/Chow.Pavone.ACC13.pdf}, owner = {bylard}, timestamp = {2017-02-20} }