Gaussian conditioning vs linear regression

08 Sep 2017

Given a dataset $\mathcal{D} = \{ (x_i, y_i) \}_{i=1}^{N}$ of pairs of inputs $x_i \in \mathbb{R}^n$ and outputs $y_i \in \mathbb{R}^m$, one may wish to be able to predict the output $y^*$ given an input $x^*$.

Linear regression

One straightforward way to approach this problem is to postulate a linear dependency between $x$ and $y$, for example, $y = Ax$, then estimate the parameters $A \in \mathbb{R}^{m \times n}$ from the dataset $\mathcal{D}$ by linear regression,

\begin{equation} \label{lin_reg} \hat{A} = YX^T(XX^T)^{-1}, \end{equation}

where matrices $X \in \mathbb{R}^{n \times N}$ and $Y \in \mathbb{R}^{m \times N}$ hold the data, and finally, when a new input $x^*$ arrives, compute the output $y^* = \hat{A}x^*$ by multiplying the input $x^*$ with the estimate $\hat{A}$ of the assumed linear mapping $A \colon x \mapsto y$.

Gaussian conditioning

Another approach is to postulate that inputs and outputs are jointly Gaussian,

$p(x, y) = \mathcal{N}\left( \begin{bmatrix} x \\ y \end{bmatrix} \biggr\rvert \begin{bmatrix} \mu_x \\ \mu_y \end{bmatrix}, \begin{bmatrix} \Sigma_{xx} & \Sigma_{xy} \\ \Sigma_{yx} & \Sigma_{yy} \end{bmatrix} \right),$

and take $y^*$ to be the mean of the conditional distribution $p(y|x)$. If both $x$ and $y$ are zero-mean, then the conditional mean is given by

\begin{equation} \label{cond_mean} \mu_{y|x} = \Sigma_{yx} \Sigma_{xx}^{-1} x. \end{equation}

Substitution of the estimates of the covariances

$\hat{\Sigma}_{yx} = \frac{1}{N} \sum_{i=1}^{N} y_i x_i^T = \frac{1}{N} YX^T \quad \textrm{and} \quad \hat{\Sigma}_{xx} = \frac{1}{N} \sum_{i=1}^{N} x_i x_i^T = \frac{1}{N} XX^T$

into Formula \eqref{cond_mean} for the conditional mean reveals a deep relation between Gaussian conditioning \eqref{cond_mean} and linear regression \eqref{lin_reg}.

Conditioning $\approx$ regression

Namely, Gaussian conditioning performs linear regression under the hood,

$\hat{A} = \hat{\Sigma}_{yx} \hat{\Sigma}_{xx}^{-1}.$

As a consequence, the output $y^*$ of the linear regression is the maximum likelihood estimate of the conditional mean $\mu_{y|x^*}$, provided inputs and outputs are assumed to be jointly Gaussian. What it means in practice is that if you are only interested in point prediction and do not care about quantifying uncertainty, then linear regression is equivalent to Gaussian conditioning.

Linear basis function models

The relation between Gaussian conditioning and linear regression straightforwardly extends to linear basis function models of the form $y = A\phi(x)$ where $\phi \colon \mathbb{R}^n \to \mathbb{R}^k$ is a (nonlinear) feature mapping. Indeed, replacing the old dataset $\mathcal{D}$ with a new dataset $\mathcal{D}_{\phi} = \{ (\phi_i, y_i) \}_{i=1}^{N}$ where $\phi_i = \phi(x_i)$, we see that all the same steps can be performed as before if $x_i$ is replaced by $\phi_i$. The underlying assumption, however, slightly changes in this case: the feature vectors $\phi$—and not the inputs $x$ themselves—are assumed to be jointly Gaussian with the outputs $y$.

Boris Belousov

Gaussian conditioning vs linear regression

Linear regression

Gaussian conditioning

Conditioning $\approx$ regression

Linear basis function models