AA 203: Optimal and Learning-Based Control

Spring 2023

Course Description

Optimal control solution techniques for systems with known and unknown dynamics. Dynamic programming, Hamilton-Jacobi reachability, and direct and indirect methods for trajectory optimization. Introduction to model predictive control. Adaptive control, model-based and model-free reinforcement learning, and connections between modern reinforcement learning and fundamental optimal control ideas.

Course Assistants

Thomas Lew Devansh Jalota

Meeting Times

Lectures are held on Mondays and Wednesdays from 1:30pm to 2:50pm in Thornton 102.

Lecture recordings will be available on Canvas.

Recitations are held on Fridays from 9:00am to 10:30am for the first four weeks of the quarter. Recitations are hybrid sessions, held both in Durand 023 Videoconference Room and on Zoom (link on Canvas).

Office Hours

Office hours will take place in Durand Building, Room 217 at the following times:

Daniele Gammelli's office hours are on Tuesdays 2:00pm to 3:00pm (course material) and Thursdays 9:00am to 10:00am (project discussion).

Spencer Richards' office hours are on Mondays and Wednesdays 3:00pm to 4:00pm.

Thomas Lew's office hours are on Fridays 1:00pm to 3:00pm.

Devansh Jalota's office hours are on Wednesdays 8:00am to 10:00am.


Subject to change.

Week Topic Lecture Slides
1 Introduction; feedback, stability, optimal control problems
Nonlinear optimization theory
Recitation: Automatic differentiation (AD) with JAX
Monday: HW0 (ungraded) out
Lecture 1
Lecture 2
Recitation 1 ; Code
2 Pontryagin's maximum principle (PMP) and indirect methods
Direct methods: Multiple shooting, collocation, and SCP
Recitation: Convex optimization with CVXPY
Monday: HW1 released
Lecture 3
Lecture 4 ; Code
Recitation 2
3 Dynamic programming (DP), discrete-time LQR
Stochastic DP, value iteration, policy iteration
Recitation: Regression models
Friday: Project proposal due
Lecture 5
Lecture 6 ; Code
Recitation 3
4 Nonlinear LQR for tracking, iLQR, DDP
The Hamilton-Jacobi-Bellman (HJB) equation, LQR in continuous-time
Recitation: Training neural networks with JAX
Monday: HW1 due, HW2 released
Lecture 7 ; Code
Lecture 8
Recitation 4
5 System identification and adaptive control
Introduction to RL and Markov decision processes (MDPs)
Lecture 9 ; Code
Lecture 10
6 MPC I: Introduction
MPC II: Feasibility and stability
Monday: HW2 due, HW3 released
Lecture 11
Lecture 12 ; Code
7 MPC III: Robustness
MPC IV: Adaptation
Monday: Project midterm report due
Lectures 13 and 14
8 RL I: Model-free RL - Value-based methods
RL II: Model-free RL - Policy optimization
Monday: HW3 due, HW4 released
Lecture 15
Lecture 16
9 No lecture (Memorial Day)
RL III: Model-based RL

Lecture 17
10 RL IV: Model-based policy optimization
Course summary, recent work, future directions
Monday: HW4 due
Wednesday: Project video presentation and final report due
Lecture 18
Course summary (optimal control)
Course summary (reinforcement learning)

Follow this link to access the course website for the previous edition of Optimal and Learning-Based Control.