\chapter{Linear Models}

Linear models represent the mainstay of structural brain
imaging. Their essence is quite simple: the data at every voxel is
modelled by a set of terms corresponding to extra information about
each scan. One can then perform hypothesis tests on each linear model
to calculate the significance of either the entire model or even the
marginal significance of each term in the model. This can be used to
ask the question of, for example, in which voxels the gender of the
subject predicts the values at that voxel.

This approach is also known as {\em massively univariate statistics} -
i.e. a separate linear model is calculated at every voxel, resulting
in thousands or even millions of separate models for every statistical
test applied to the images. The last step in analysing such data is
thus often to account for these thousands of comparisons so that the
results do not occur just by random chance.

\section{First linear model}

Let's start with a simple linear model - lets see where the Jacobian
determinants contained in the files used in the male-female dataset
are best modelled by the gender of the mouse.

<<>>=
vs <- mincLm(Filename ~ Gender, gf)
vs
mincWriteVolume(vs, "simple-lm.mnc", "GenderMale")
@ 

mincLm is the command to run linear models in RMINC. Its basic use is
to provide a formula (same syntax as the R \texttt{lm} command) with
the left side containing the filenames, the right side the variables
to be regressed. The output of mincLm depends on the formula. There
will always be a column of F-statistics, representing the significance
of the entire model. Then there is one column for each of the terms in
the mode. The above linear model, relating the Jacobian determinant to
gender, will thus have three columns:

\begin{description}
\item[F-statistic] representing the significance of the entire model.
\item[(Intercept)] the intercept term - this term is rarely
  interesting, as it tests for whether the intercept is 0. There's no
  reason to believe it should be in most cases, so this value will be
  highly significant but meaningless.
\item[GenderMale] the term testing whether the ``Male'' level of the
  Gender factor is significant. In this case this term is the most
  interesting and therefore the one written to file.
\end{description}

The output is placed into a variable that can be written to file in
the same way as described in the descriptive statistics section. 


\section{Plotting voxels}

<<>>=
options(show.signif.stars=FALSE)
voxel <- mincGetVoxel(gf$Filename, 44, 20, 52)
summary(lm(voxel ~ Gender, gf))
vs[635093,]
@ 

The code above does the following: it gets the voxel from coordinates
44, 20, 52 for all subjects, then computes a linear model relating
that voxel to Genotype using standard R functions. Lastly it prints
the results from that same voxel as computed by
\texttt{mincLm}\footnote{The actual number indexed here - 635093 -
  might appear odd. RMINC treats all MINC volumes as 1-dimensional
  arrays, so the actual index has to be computed by the following
  formula: $\left(index_1 * size_2 + index_2\right) + size_3 +
  index_3$}. This helps illustrate what the output of \texttt{mincLm}
stores: the F-statistic is the same as can be found in the last line
of the summary command, and the t-statistics for the Intercept and
Genotype column can be found under "t-value" when using standard R
functions.

\texttt{mincGetVoxel} needs three coordinates, given in voxel space in
the same order as stored in the file. Just printing the voxel will
show the corresponding world coordinates:

<<>>=
voxel
@ 

If the coordinates are specified in world coordinates then
\texttt{mincGetWorldVoxel} is what you want - it also takes three
coordinates, this time in world space in xspace,yspace,zspace order:

<<>>=
world.voxel <- mincGetWorldVoxel(gf$Filename, -6.27, -8.19, -4.2)
world.voxel
@ 

\section{Creating images}

RMINC has the ability to call ray_trace to create images of individual
slices corresponding to a specific voxel. For this to work
ray_trace\footnote{\url{http://packages.bic.mni.mcgill.ca}} has to be
installed and present in the path, as does
MICe-minc-tools\footnote{\url{http://wiki.phenogenomics.ca:8080/display/MICePub/MICe-minc-tools}}. 

An example is given below - this will create an image of the slice
corresponding to the voxel location along with a cross-hair over that
voxel. The output can be seen in figure \ref{ray-trace-img}, along
with a box-and-whiskers plot of the data at that voxel.

<<>>=
mincRayTraceStats(voxel, "volumes/anatomy.mnc",
                  vs, "GenderMale", image.min=350000, image.max=1.0e+06,
                  display=F)
@ 

The \texttt{mincRayTraceStats} function needs the following arguments:
a voxel, obtained by mincGetVoxel or mincGetWorldVoxel, the path
towards a MINC image containing some background anatomy, the output of
mincLm (vs in this case), and minimum and maximum values of the
background anatomy.

\begin{figure}
\begin{minipage}[b]{0.5\linewidth}
\centering
\includegraphics[width=2.5in]{ray_trace_crosshair.png}
\caption{Ray-traced slice}
\label{ray-trace-img}
\end{minipage}
\begin{minipage}[b]{0.5\linewidth}
\centering
<<fig=T, echo=F>>=
plot(voxel ~ Gender, gf)
@ 
\caption{Box and whiskers plot from cursor location}
\end{minipage}
\end{figure}

\section{Using subsets}

It is quite common to want to run a linear model on only a subset of
the data. This can be quite easily accomplished in \texttt{mincLm}
using an extra subsetting specification:

<<>>=
vs <- mincLm(Filename ~ Gender, gf, coil==1)
vs
@ 

This is the same linear model command as executed above, but this time
using only use mice scanned on RF coil number 1. The subset command
works exactly the same way as for the standard \texttt{lm} command
from R.

\section{Multiple Comparisons}

The example below illustrates the entire process involved in going
running a linear model and correcting for multiple comparisons using
the False Discovery Rate:

<<>>=
vs <- mincLm(Filename ~ Gender, gf)
qvals <- mincFDR(vs, mask="volumes/mask.mnc", method="pFDR")
qvals
mincWriteVolume(qvals, "Gender-FDR.mnc", "GenderMale")
@ 

The first command computes a linear model using \texttt{mincLm}. The
results are then passed on to \texttt{mincFDR}, which computes the
False Discovery Rate threshold separately for each of the terms in the
linear model. Only results from within the mask specified as an
optional argument to \texttt{mincFDR} are considered. The thresholds
detected at different levels (0.01, 0.05, 0.10, 0.15, and 0.20) are
then printed out. In this example seen above there is no data at a FDR
level of 0.01, 0.05, or 0.10, but any t-statistic greater than 2.96
(or less than -2.96) would be significant at a 15\% false positive
level - i.e. 15\% of the voxels above that threshold would be, on
average, false positives. The ``GenderMale'' column is then written to
file. Note that we use the positive false discovery rate above - the
more standard false discovery rate is the default (i.e. if no method
argument is specified for \texttt{mincFDR})