Risk-averse linearly-solvable control

05 Mar 2018

To get into this topic, I highly recommend the paper Efficient computation of optimal actions together with the supplement by Emo Todorov. Here, I will first provide a summary of the derivations from that paper in continuous time with a conventional cumulative cost objective; after that, I will repeat the derivations with an exponential objective. The result of this exercise is a couple of formulas that relate costs under optimal controller to costs under uncontrolled dynamics.

Consider controlled diffusion

\begin{equation*} dx = a(x) dt + b(x) (udt + \sigma dw) \end{equation*}

with the generator

\begin{equation*} \newcommand{\J}{\mathcal{J}} \newcommand{\L}{\mathcal{L}} \newcommand{\p}{\partial} \L^u[\cdot](t, x) = (a(x) + b(x)u)\p_x + \frac{1}{2}b^2(x)\sigma^2 \p_x^2 \end{equation*}

and the cost rate

\begin{equation*} c(x, u) = q(x) + \frac{u^2}{2\sigma^2}. \end{equation*}

Additive objective

Trajectory cost

\begin{equation*} \J^{u(\cdot)}(t, x) = \int_t^T c(x(\tau), u(\tau)) d\tau + Q(x(T)) \end{equation*}

Expected cost

\begin{equation*} J^{u(\cdot)}(t, x) = E \left[ \J^{u(\cdot)}(t, x) \; | \; x(t) = x \right] \end{equation*}

Value function

\begin{equation*} v(t, x) = \min_{u(\cdot)} \left\{ J^{u(\cdot)}(t, x) \right\} \end{equation*}

HJB

\begin{equation*} -v_t(t, x) = \min_u \left\{ c(x, u) + \L^u v(t, x) \right\} \end{equation*}

HJB expanded

\begin{equation*} -v_t = \min_u \left\{ q(x) + \frac{u^2}{2\sigma^2} + (a+bu)v_x + \frac{1}{2} b^2 \sigma^2 v_xx \right\} \end{equation*}

Optimal control

\begin{equation*} u(t, x) = -\sigma^2 b(x) v_x (t, x) \end{equation*}

After substituting the optimal controller into HJB

\begin{equation*} -v_t = q + av_x + \frac{1}{2}b^2\sigma^2v_xx - \frac{1}{2}b^2\sigma^2v_x^2 \end{equation*}

Exponential transform

\begin{equation*} z(t, x) = e^{-v(t, x)} \end{equation*}

Linearized HJB

\begin{equation*} -z_t = -qz + \L^0 z \end{equation*}

Feynman-Kac formula

\begin{equation*} z(t, x) = E^0 \left[ e^{-\int_t^T q(x(\tau)) d\tau - Q(x(T))} \;\bigg|\; x(t) = x \right] \end{equation*}

Note the zero attached to $\L$ and $E$; it stands for $u = 0$, i.e., uncontrolled (or passive) dynamics. Thus, the relation

\begin{equation} \newcommand{\opt}{\text{optimal}} \newcommand{\pas}{\text{passive}} \newcommand{\tc}{\text{total cost}} E_{\opt} [ -\tc ] = \log E_{\pas} [ \exp(-\tc) ] \label{add} \end{equation}

allows us to find the value function $v(t, x)$ by simulating passive dynamics; the optimal controller is then proportional to $v_x$ as described above.

Multiplicative objective

Trajectory cost

\begin{equation*} C^{u(\cdot)}(t, x) = \int_t^T c(x(\tau), u(\tau)) d\tau + Q(x(T)) \end{equation*}

Exponential cost

\begin{equation*} \J_\beta^{u(\cdot)}(t, x) = \exp \left( \beta C^{u(\cdot)}(t, x) \right) \end{equation*}

Expected exponential cost

\begin{equation*} J_\beta^{u(\cdot)}(t, x) = E \left[ \J_\beta^{u(\cdot)}(t, x) \;\bigg|\; x(t) = x \right] \end{equation*}

Exponential value function

\begin{equation*} J_\beta(t, x) = \min_{u(\cdot)} \left\{ J_\beta^{u(\cdot)}(t, x) \right\} \end{equation*}

Value function

\begin{equation*} v^\beta(t, x) = \min_{u(\cdot)} \left\{ \beta^{-1} \log E \left[ \exp \left( \beta C^{u(\cdot)}(t, x) \right) \;\bigg|\; x(t) = x \right] \right\} \end{equation*}

HJB

\begin{equation*} -v_t = \min_u \left\{ c + (a+bu)v_x + \frac{1}{2}b^2\sigma^2(v_{xx} + \beta v_x^2) \right\} \end{equation*}

Optimal controller

\begin{equation*} u = -\sigma^2 b v_x \end{equation*}

After substituting the optimal controller into HJB

\begin{equation*} -\p_t v = q + \L^0 v + \frac{\beta-1}{2} b^2 \sigma^2 v_x^2 \end{equation*}

Let $\beta = 1$, then

\begin{equation*} -\p_t v = q + \L^0 v \end{equation*}

By Feynman-Kac, the solution is

\begin{equation*} v(t, x) = E^0 \left[ \int_t^T q(x(\tau)) d\tau + Q(x(T)) \;\bigg|\; x(t) = x \right] \end{equation*}

In other words,

\begin{equation} E_{\pas} [ -\tc ] = \log E_{\opt} [ \exp(-\tc) ] \label{mul} \end{equation}

Comparing Formulas \eqref{add} and \eqref{mul}, we see that labels ‘optimal’ and ‘passive’ switch roles.

Boris Belousov

Risk-averse linearly-solvable control

Additive objective

Multiplicative objective