Optimal control solution techniques for systems with known and unknown dynamics. Dynamic programming, Hamilton-Jacobi reachability, and direct and indirect methods for trajectory optimization. Introduction to model predictive control. Adaptive control, model-based and model-free reinforcement learning, and connections between modern reinforcement learning and fundamental optimal control ideas.
Spencer M. Richards | Daniele Gammelli |
---|
Thomas Lew | Devansh Jalota |
---|
Lectures are held on Mondays and Wednesdays from 1:30pm to 2:50pm in Thornton 102.
Lecture recordings will be available on Canvas.
Recitations are held on Fridays from 9:00am to 10:30am for the first four weeks of the quarter. Recitations are hybrid sessions, held both in Durand 023 Videoconference Room and on Zoom (link on Canvas).
Office hours will take place in Durand Building, Room 217 at the following times:
Daniele Gammelli's office hours are on Tuesdays 2:00pm to 3:00pm (course material) and Thursdays 9:00am to 10:00am (project discussion).
Spencer Richards' office hours are on Mondays and Wednesdays 3:00pm to 4:00pm.
Thomas Lew's office hours are on Fridays 1:00pm to 3:00pm.
Devansh Jalota's office hours are on Wednesdays 8:00am to 10:00am.
Subject to change.
Week | Topic | Lecture Slides |
---|---|---|
1 |
Introduction; feedback, stability, optimal control problems
Nonlinear optimization theory Recitation: Automatic differentiation (AD) with JAX Monday: HW0 (ungraded) out |
Lecture 1
Lecture 2 Recitation 1 ; Code |
2 |
Pontryagin's maximum principle (PMP) and indirect methods
Direct methods: Multiple shooting, collocation, and SCP Recitation: Convex optimization with CVXPY Monday: HW1 released |
Lecture 3
Lecture 4 ; Code Recitation 2 |
3 |
Dynamic programming (DP), discrete-time LQR
Stochastic DP, value iteration, policy iteration Recitation: Regression models Friday: Project proposal due |
Lecture 5
Lecture 6 ; Code Recitation 3 |
4 |
Nonlinear LQR for tracking, iLQR, DDP
The Hamilton-Jacobi-Bellman (HJB) equation, LQR in continuous-time Recitation: Training neural networks with JAX Monday: HW1 due, HW2 released |
Lecture 7
; Code
Lecture 8 Recitation 4 |
5 |
System identification and adaptive control
Introduction to RL and Markov decision processes (MDPs) |
Lecture 9
; Code
Lecture 10 |
6 |
MPC I: Introduction
MPC II: Feasibility and stability Monday: HW2 due, HW3 released |
Lecture 11
Lecture 12 ; Code |
7 |
MPC III: Robustness
MPC IV: Adaptation Monday: Project midterm report due |
Lectures 13 and 14
|
8 |
RL I: Model-free RL - Value-based methods
RL II: Model-free RL - Policy optimization Monday: HW3 due, HW4 released |
Lecture 15
Lecture 16 |
9 |
No lecture (Memorial Day)
RL III: Model-based RL |
Lecture 17 |
10 |
RL IV: Model-based policy optimization
Course summary, recent work, future directions Monday: HW4 due Wednesday: Project video presentation and final report due |
Lecture 18
Course summary (optimal control) Course summary (reinforcement learning) |
Follow this link to access the course website for the previous edition of Optimal and Learning-Based Control.