\chapter{Linear Models} Linear models represent the mainstay of structural brain imaging. Their essence is quite simple: the data at every voxel is modelled by a set of terms corresponding to extra information about each scan. One can then perform hypothesis tests on each linear model to calculate the significance of either the entire model or even the marginal significance of each term in the model. This can be used to ask the question of, for example, in which voxels the gender of the subject predicts the values at that voxel. This approach is also known as {\em massively univariate statistics} - i.e. a separate linear model is calculated at every voxel, resulting in thousands or even millions of separate models for every statistical test applied to the images. The last step in analysing such data is thus often to account for these thousands of comparisons so that the results do not occur just by random chance. \section{First linear model} Let's start with a simple linear model - lets see where the Jacobian determinants contained in the files used in the male-female dataset are best modelled by the gender of the mouse. <<>>= vs <- mincLm(Filename ~ Gender, gf) vs mincWriteVolume(vs, "simple-lm.mnc", "GenderMale") @ mincLm is the command to run linear models in RMINC. Its basic use is to provide a formula (same syntax as the R \texttt{lm} command) with the left side containing the filenames, the right side the variables to be regressed. The output of mincLm depends on the formula. There will always be a column of F-statistics, representing the significance of the entire model. Then there is one column for each of the terms in the mode. The above linear model, relating the Jacobian determinant to gender, will thus have three columns: \begin{description} \item[F-statistic] representing the significance of the entire model. \item[(Intercept)] the intercept term - this term is rarely interesting, as it tests for whether the intercept is 0. There's no reason to believe it should be in most cases, so this value will be highly significant but meaningless. \item[GenderMale] the term testing whether the ``Male'' level of the Gender factor is significant. In this case this term is the most interesting and therefore the one written to file. \end{description} The output is placed into a variable that can be written to file in the same way as described in the descriptive statistics section. \section{Plotting voxels} <<>>= options(show.signif.stars=FALSE) voxel <- mincGetVoxel(gf$Filename, 44, 20, 52) summary(lm(voxel ~ Gender, gf)) vs[635093,] @ The code above does the following: it gets the voxel from coordinates 44, 20, 52 for all subjects, then computes a linear model relating that voxel to Genotype using standard R functions. Lastly it prints the results from that same voxel as computed by \texttt{mincLm}\footnote{The actual number indexed here - 635093 - might appear odd. RMINC treats all MINC volumes as 1-dimensional arrays, so the actual index has to be computed by the following formula: $\left(index_1 * size_2 + index_2\right) + size_3 + index_3$}. This helps illustrate what the output of \texttt{mincLm} stores: the F-statistic is the same as can be found in the last line of the summary command, and the t-statistics for the Intercept and Genotype column can be found under "t-value" when using standard R functions. \texttt{mincGetVoxel} needs three coordinates, given in voxel space in the same order as stored in the file. Just printing the voxel will show the corresponding world coordinates: <<>>= voxel @ If the coordinates are specified in world coordinates then \texttt{mincGetWorldVoxel} is what you want - it also takes three coordinates, this time in world space in xspace,yspace,zspace order: <<>>= world.voxel <- mincGetWorldVoxel(gf$Filename, -6.27, -8.19, -4.2) world.voxel @ \section{Creating images} RMINC has the ability to call ray_trace to create images of individual slices corresponding to a specific voxel. For this to work ray_trace\footnote{\url{http://packages.bic.mni.mcgill.ca}} has to be installed and present in the path, as does MICe-minc-tools\footnote{\url{http://wiki.phenogenomics.ca:8080/display/MICePub/MICe-minc-tools}}. An example is given below - this will create an image of the slice corresponding to the voxel location along with a cross-hair over that voxel. The output can be seen in figure \ref{ray-trace-img}, along with a box-and-whiskers plot of the data at that voxel. <<>>= mincRayTraceStats(voxel, "volumes/anatomy.mnc", vs, "GenderMale", image.min=350000, image.max=1.0e+06, display=F) @ The \texttt{mincRayTraceStats} function needs the following arguments: a voxel, obtained by mincGetVoxel or mincGetWorldVoxel, the path towards a MINC image containing some background anatomy, the output of mincLm (vs in this case), and minimum and maximum values of the background anatomy. \begin{figure} \begin{minipage}[b]{0.5\linewidth} \centering \includegraphics[width=2.5in]{ray_trace_crosshair.png} \caption{Ray-traced slice} \label{ray-trace-img} \end{minipage} \begin{minipage}[b]{0.5\linewidth} \centering <>= plot(voxel ~ Gender, gf) @ \caption{Box and whiskers plot from cursor location} \end{minipage} \end{figure} \section{Using subsets} It is quite common to want to run a linear model on only a subset of the data. This can be quite easily accomplished in \texttt{mincLm} using an extra subsetting specification: <<>>= vs <- mincLm(Filename ~ Gender, gf, coil==1) vs @ This is the same linear model command as executed above, but this time using only use mice scanned on RF coil number 1. The subset command works exactly the same way as for the standard \texttt{lm} command from R. \section{Multiple Comparisons} The example below illustrates the entire process involved in going running a linear model and correcting for multiple comparisons using the False Discovery Rate: <<>>= vs <- mincLm(Filename ~ Gender, gf) qvals <- mincFDR(vs, mask="volumes/mask.mnc", method="pFDR") qvals mincWriteVolume(qvals, "Gender-FDR.mnc", "GenderMale") @ The first command computes a linear model using \texttt{mincLm}. The results are then passed on to \texttt{mincFDR}, which computes the False Discovery Rate threshold separately for each of the terms in the linear model. Only results from within the mask specified as an optional argument to \texttt{mincFDR} are considered. The thresholds detected at different levels (0.01, 0.05, 0.10, 0.15, and 0.20) are then printed out. In this example seen above there is no data at a FDR level of 0.01, 0.05, or 0.10, but any t-statistic greater than 2.96 (or less than -2.96) would be significant at a 15\% false positive level - i.e. 15\% of the voxels above that threshold would be, on average, false positives. The ``GenderMale'' column is then written to file. Note that we use the positive false discovery rate above - the more standard false discovery rate is the default (i.e. if no method argument is specified for \texttt{mincFDR})