Linear Model with Structure in the Observations

We have a linear (or logistic) regression, but the difference from standard regression is that the noise in the observations is correlated. If we have n observations (datapoints) in the training set, we have $n^2$ constraints represented by a matrix $K$.

What are useful examples of such a structure?

One example is when your datapoints represent people located in close geographical areas. Such a constraint might also be obtained by precomputing some pairwise similarity and will be useful when you have too few observations.

How is this model solved?

Let $K = USU’$ be an $n \times n$ matrix giving the correlations between observations. If $X$ is an $n \times m$ a feature matrix, and $y$ is an $n \times 1$ vector of labels, the solution is first to transform $X$ and $y$ into $U’X$ and $U’y$ and apply standard regression.

Reference:

FaST Linear Mixed Models for Genome-Wide Association Studies by Christoph Lippert, Jennifer Listgarten, Ying Liu, Carl M. Kadie1,Robert I. Davidson, and David Heckerman

Comments