\documentclass[article,shortnames,nojss]{jss} \usepackage{thumbpdf} %% need no \usepackage{Sweave.sty} %% Packages. \usepackage{amssymb} \usepackage{amsmath} \usepackage{bm} \usepackage{xspace} %\VignetteIndexEntry{The VGAM Package for Capture--Recapture Data Using the Conditional Likelihood} %\VignetteDepends{VGAM} %\VignetteKeywords{closed population size estimation, conditional likelihood,mark--capture--recapture, vector generalized additive model, VGAM} %\VignettePackage{VGAM} %% new commands %% Shortcut commands. \newcommand{\logit}{\mbox{\rm logit}} \newcommand{\bone}{{\bf 1}} \newcommand{\bzero}{{\bf 0}} \newcommand{\bid}{\mbox{$\bm{\mathcal{D}}$}} \newcommand{\bib}{\mbox{$\bm{b}$}} \newcommand{\bif}{\mbox{$\bm{f}$}} \newcommand{\bix}{\mbox{$\bm{x}$}} \newcommand{\biy}{\mbox{$\bm{y}$}} \newcommand{\biz}{\mbox{$\bm{z}$}} \newcommand{\bB}{\mbox{\rm \bf B}} \newcommand{\bX}{\mbox{\rm \bf X}} \newcommand{\bH}{\mbox{\rm \bf H}} \newcommand{\bI}{\mbox{\rm \bf I}} \newcommand{\bOO}{\mbox{\rm \bf O}} \newcommand{\bW}{\mbox{\rm \bf W}} \newcommand{\bY}{\mbox{\rm \bf Y}} \newcommand{\bbeta}{\mbox{$\bm{\beta}$}} \newcommand{\boldeta}{\mbox{$\bm{\eta}$}} \newcommand{\btheta}{\mbox{$\bm{\theta}$}} \newcommand{\calM}{\mbox{$\mathcal{M}$}} \newcommand{\mytilde}{\mbox{\lower.80ex\hbox{\char`\~}\xspace}} \author{Thomas W. Yee\\The University of Auckland \And Jakub Stoklosa\\The University of New South Wales \AND Richard M. Huggins\\The University of Melbourne} \title{The \pkg{VGAM} Package for Capture--Recapture Data Using the Conditional Likelihood} %% for pretty printing and a nice hypersummary also set: \Plainauthor{Thomas W. Yee, Jakub Stoklosa, Richard M. Huggins} %% comma-separated \Plaintitle{The VGAM Package for Capture--Recapture Data Using the Conditional Likelihood} %% without formatting \Shorttitle{The VGAM Package for Capture--Recapture Data} %% a short title (if necessary) %% an abstract and keywords \Abstract{ It is well known that using individual covariate information (such as body weight or gender) to model heterogeneity in capture--recapture (CR) experiments can greatly enhance inferences on the size of a closed population. Since individual covariates are only observable for captured individuals, complex conditional likelihood methods are usually required and these do not constitute a standard generalized linear model (GLM) family. Modern statistical techniques such as generalized additive models (GAMs), which allow a relaxing of the linearity assumptions on the covariates, are readily available for many standard GLM families. Fortunately, a natural statistical framework for maximizing conditional likelihoods is available in the Vector GLM and Vector GAM classes of models. We present several new \proglang{R}-functions (implemented within the \pkg{VGAM} package) specifically developed to allow the incorporation of individual covariates in the analysis of closed population CR data using a GLM/GAM-like approach and the conditional likelihood. As a result, a wide variety of practical tools are now readily available in the \pkg{VGAM} object oriented framework. We discuss and demonstrate their advantages, features and flexibility using the new \pkg{VGAM} CR functions on several examples. } \Keywords{closed population size estimation, conditional likelihood, mark--capture--recapture, vector generalized additive model, \pkg{VGAM}} \Plainkeywords{closed population, conditional likelihood, mark--capture--recapture, vector generalized additive model, VGAM R package} \Address{ Thomas W. Yee \\ Department of Statistics \\ University of Auckland, Private Bag 92019 \\ Auckland Mail Centre \\ Auckland 1142, New Zealand \\ E-mail: \email{t.yee@auckland.ac.nz}\\ URL: \url{http://www.stat.auckland.ac.nz/~yee/} } \begin{document} <>= library("VGAM") library("VGAMdata") ps.options(pointsize = 12) options(width = 72, digits = 4) options(SweaveHooks = list(fig = function() par(las = 1))) options(prompt = "R> ", continue = "+") @ %********************************************************************* \section[Introduction]{Introduction} %% Note: If there is markup in \(sub)section, then it has to be escape as above. \label{sec:intro} Note: this vignette is essentially \cite{yee:stok:hugg:2015}. \bigskip Capture--recapture (CR) surveys are widely used in ecology and epidemiology to estimate population sizes. In essence they are sampling schemes that allow the estimation of both $n$ and $p$ in a Binomial($n$, $p$) experiment \citep{hugg:hwan:2011}. The simplest CR sampling design consists of units or individuals in some population that are captured or tagged across several sampling occasions, e.g., trapping a nocturnal mammal species on seven consecutive nights. In these experiments, when an individual is captured for the first time then it is marked or tagged so that it can be identified upon subsequent recapture. On each occasion recaptures of individuals which have been previously marked are also noted. Thus each observed individual has a capture history: a vector of 1s and 0s denoting capture/recapture and noncapture respectively. The unknown population size is then estimated using the observed capture histories and any other additional information collected on captured individuals, such as weight or sex, along with environmental information such as rainfall or temperature. We consider closed populations, where there are no births, deaths, emigration or immigration throughout the sampling period \citep{amst:mcdo:manl:2005}. Such an assumption is often reasonable when the overall time period is relatively short. \citet{otis:etal:1978} provided eight specific closed population CR models (see also \citet{pollock:1991}), which permit the individual capture probabilities to depend on time and behavioural response, and be heterogeneous between individuals. The use of covariate information (or explanatory variables) to explain heterogeneous capture probabilities in CR experiments has received considerable attention over the last 30 years \citep{pollock:2002}. Population size estimates that ignore this heterogeneity typically result in biased population estimates \citep{amst:mcdo:manl:2005}. A recent book on CR experiements as a whole is \cite{mccr:morg:2014}. Since individual covariate information (such as gender or body weight) can only be collected on observed individuals, conditional likelihood models are employed \citep{pollock:1984,hugg:1989,alho:1990,lebreton:1992}. That is, one conditions on the individuals seen at least once through-out the experiment, hence they allow for individual covariates to be considered in the analysis. The capture probabilities are typically modelled as logistic functions of the covariates, and parameters are estimated using maximum likelihood. Importantly, these CR models are generalized linear models \citep[GLMs;][]{mccull:1989,hugg:hwan:2011}. Here, we maximize the conditional likelihood (or more formally the positive-Bernoulli distribution) models of \citet{hugg:1989}. This approach has become standard practice to carry out inferences when considering individual covariates, with several different software packages currently using this methodology, including: \proglang{MARK} \citep{cooch:white:2012}, \proglang{CARE-2} \citep{hwang:chao:2003}, and the \proglang{R} packages \citep{R:2014}: \pkg{mra} \citep{mcdonald:2010}, \pkg{RMark} \citep{laake:2013} and \pkg{Rcapture} \citep{rcapturepackage:2012,Baillargeon:Rivest:2007}, the latter package uses a log-linear approach, which can be shown to be equivalent to the conditional likelihood \citep{cormak:1989,hugg:hwan:2011}. These programs are quite user friendly, and specifically, allow modelling capture probabilities as linear functions of the covariates. So an obvious question is to ask why develop yet another implementation for closed population CR modelling? Firstly, nonlinearity arises quite naturally in many ecological applications, \citep{schluter1988,yee:mitc:1991,craw:1993,gimenez:2006,bolk:2008}. In the CR context, capture probabilities may depend nonlinearly on individual covariates, e.g., mountain pygmy possums with lighter or heavier body weights may have lower capture probabilities compared with those having mid-ranged body weights \citep[e.g.,][]{hugg:hwan:2007,stok:hugg:2012}. However, in our experience, the vast majority of CR software does not handle nonlinearity well in regard to both estimation and in the plotting of the smooth functions. Since GAMs \citep[]{hastie:1990,wood:2006} were developed in the mid-1980s they have become a standard tool for data analysis in regression. The nonlinear relationship between the response and covariate is flexibly modelled, and few assumptions are made on the functional relationship. The drawback in applying these models to CR data has been the difficult programming required to implement the approach. Secondly, we have found several implementations of conditional likelihood slow, and in some instances unreliable and difficult to use. We believe our implementation has superior capabilities, and has good speed and reliability. The results of Section \ref{sec:poz:posbernoulli.eg.timingtests} contrast our software with some others. Moreover, the incorporation of these methods in a general, maintained statistical package will result in them being updated as the package is updated. Standard GLM and GAM methodologies are unable to cope with the CR models considered in this article because they are largely restricted to one linear/additive predictor $\eta$. Fortunately however, a natural extension in the form of the vector generalized linear and additive model (VGLM/VGAM) classes do allow for multiple $\eta$s. VGAMs and VGLMs are described in \citet{yee:wild:1996} and \citet{yee:hast:2003}. Their implementation in the \pkg{VGAM} package \citep{yee:2008,yee:2010,yee:VGAM:2013-093} has become increasing popular and practical over the last few years, due to large number of exponential families available for discrete/multinomial response data. In addition to flexible modelling of both VGLMs and VGAMs, a wide range of useful features are also available: \begin{itemize} \item smoothing capabilities; \item model selection using, e.g., AIC or BIC \citep{burnham:anderson:1999}; \item regression diagnostics and goodness--of--fit tools; \item reduced-rank regression \citep{yee:hast:2003} for dimension reduction; \item computational speed and robustness; \item choice of link functions; \item offsets and prior weights; and \item (specifically) when using \proglang{R}: generic functions based on object oriented programming, e.g., \code{fitted()}, \code{coef()}, \code{vcov()}, \code{summary()}, \code{predict()}, \code{AIC()}, etc. \end{itemize} Our goal is to provide users with an easy-to-use object-oriented \pkg{VGAM} structure, where four \code{family}-type functions based on the conditional likelihood are available to fit the eight models of \citet{otis:etal:1978}. We aim to give the user additional tools and features, such as those listed above, to carry out a more informative and broader analysis of CR data; particularly when considering more than one covariate. Finally, this article primarily focuses on the technical aspects of the proposed package, and less so on the biological interpretation for CR experiments. The latter will be presented elsewhere. An outline of this article is as follows. In Section \ref{sec:cr} we present the conditional likelihood for CR models and a description of the eight \citet{otis:etal:1978} models. Section \ref{sec:vgam} summarizes pertinent details of VGLMs and VGAMs. Their connection to the CR models is made in Section \ref{sec:meth}. Software details are given in Section \ref{sec:software}, and examples on real and simulated data using the new software are demonstrated in Section \ref{sec:body:exam}. Some final remarks are given in Section \ref{sec:discussion}. The two appendices give some technical details relating to the first and second derivatives of the conditional log-likelihood, and the means. \begin{table}[tt] \centering \begin{tabular}{cl} \hline \ \ \ Symbol \ \ \ & Explanation \\ \hline % -------------------------------------- $N$ & (Closed) population size to be estimated \\ % -------------------------------------- $n$ & Total number of distinct individuals caught in the trapping experiment \\ % -------------------------------------- $\tau$ & Number of sampling occasions, where $\tau \geq 2$ \\ % -------------------------------------- $\biy_i$ & Vector of capture histories for individual $i$ $(i=1,\ldots,n)$ with observed values\\ & 1 (captured) and 0 (noncaptured). Each $\biy_i$ has at least one observed 1 \\ % -------------------------------------- ``$h$'' & Model $\calM$ subscript, for heterogeneity \\ % -------------------------------------- ``$b$'' & Model $\calM$ subscript, for behavioural effects \\ % -------------------------------------- ``$t$'' & Model $\calM$ subscript, for temporal effects \\ % -------------------------------------- $p_{ij}$ & Probability that individual $i$ is captured at sampling occasion $j$ $(j=1,\ldots,\tau)$ \\ % -------------------------------------- $z_{ij}$ & $= 1$ if individual $i$ has been captured before occasion $j$, else $= 0$ \\ % -------------------------------------- $\btheta^{}$ & Vector of regression coefficients to be estimated related to $p_{ij}$ \\ % -------------------------------------- $\boldeta$ & Vector of linear predictors (see Table \ref{tab2} for further details) \\ % -------------------------------------- $g$ & Link function applied to, e.g., $p_{ij}$. Logit by default \\ % -------------------------------------- \hline \end{tabular} \caption{ Short summary of the notation used for the positive-Bernoulli distribution for capture--recapture (CR) experiments. Additional details are in the text. \label{tab0} } \end{table} %********************************************************************* \section[Capture--recapture models]{Capture--recapture models} \label{sec:cr} In this section we give an outline for closed population CR models under the conditional likelihood/GLM approach. For further details we recommend \citet{hugg:1991} and \citet{hugg:hwan:2011}. The notation of Table \ref{tab0} is used throughout this article. % --------------------------------------------------------------- \subsection{Conditional likelihood} \label{sec:condlik} Suppose we have a closed population of $N$ individuals, labelled $i=1,\ldots,N$ and $\tau$ capture occasions labelled $j=1,\ldots,\tau$. We make the usual assumptions that individuals in the population behave independently of each other, individuals do not lose their tags, and tags are recorded correctly. Let $y_{ij}=1$ if the $i$th individual was caught on the $j$th occasion and be zero otherwise, and let $n$ be the number of distinct individuals captured. Let $p_{ij}$ denote the probability of capturing individual $i$ on occasion $j$. As noted in Section \ref{sec:intro}, \citet{otis:etal:1978} describe eight models for the capture probabilities, see Section \ref{sec:8models} for further details. Label the individuals captured in the experiment by $i=1,\ldots,n$ and those never captured by $i=n+1,\ldots,N$. The full likelihood is given by \begin{eqnarray} L_{f} & = & K \prod_{i=1}^{N}\prod_{j=1}^{\tau} p_{ij}^{y_{ij}} (1-p_{ij})^{1- y_{ij}} \nonumber \\ & = & K \left\{\prod_{i=1}^{n}\prod_{j=1}^{\tau}p_{ij}^{y_{ij}} (1-p_{ij})^{1 - y_{ij}}\right\}\cdot \left\{\prod_{i=n+1}^{N} \prod_{j=1}^{\tau} (1-p_{ij})\right\} \label{eq:posbern.uncondlikelihood} \end{eqnarray} where $K$ is independent of the $p_{ij}$ but may depend on $N$. The RHS of (\ref{eq:posbern.uncondlikelihood}) requires knowledge of the uncaptured individuals and in general cannot be computed. Consequently no MLE of $N$ will be available unless some homogeneity assumption is made about the noncaptured individuals. Instead, a conditional likelihood function based only on the individuals observed at least once is \begin{eqnarray} \label{eq:posbern.condlikelihood} L_{c} & \propto & \prod_{i=1}^{n} \frac{\prod_{j=1}^{\tau} p_{ij}^{y_{ij}} (1-p_{ij})^{1 - y_{ij}}} {1-\prod_{s=1}^{\tau}(1-p_{is}^{\dagger})}. \end{eqnarray} is used. Here $p_{is}^{\dagger}$ are the $p_{ij}$ computed as if the individual had not been captured prior to $j$ so that the denominator is the probability individual $i$ is captured at least once. This conditional likelihood (\ref{eq:posbern.condlikelihood}) is a modified version of the likelihood corresponding to a positive-Bernoulli distribution \citep{patil:1962}. \renewcommand{\arraystretch}{1.2} \begin{table}[tt] \begin{center} \begin{tabular}{|c||c|c|c|c|} \hline Capture & \multicolumn{4}{c|}{Joint probability}\\ \cline{2-5} history & \multicolumn{1}{c|}{$\calM_0$/$\calM_h$} & \multicolumn{1}{c|}{$\calM_b$/$\calM_{bh}$} & \multicolumn{1}{c|}{$\calM_t$/$\calM_{th}$} & \multicolumn{1}{c|}{$\calM_{tb}$/$\calM_{tbh}$} \\ \hline 01 & $(1-p) p$ & $(1-p_{c}) \, p_{c}$ & $(1-p_1) p_2$ & $(1-p_{c1}) \, p_{c2}$ \\ 10 & $p(1-p)$ & $p_{c} (1-p_{r})$ & $p_1 (1-p_2)$ & $p_{c1} (1-p_{r2})$ \\ 11 & $p^2$ & $p_{c} \, p_{r}$ & $p_1 \, p_2$ & $p_{c1} \, p_{r2}$ \\ \hline 00 & $(1-p)^2$ & $(1-p_{c})^2$ & $(1-p_1)(1-p_2)$ & $(1-p_{c1})(1-p_{c2})$ \\ \hline \hline $ M \equiv \dim(\boldeta)$ & 1 & 2 & 2 $(=\tau)$ & 3 $(=2 \tau - 1)$ \\ \hline \end{tabular} \end{center} \caption{Capture history sample space and corresponding probabilities for the eight models of \citet{otis:etal:1978}, with $\tau=2$ capture occasions in closed population CR experiment. Here, $p_{cj}=$ capture probability for unmarked individuals at sampling period $j$, $p_{rj}=$ recapture probability for marked individuals at sampling period $j$, and $p=$ constant capture probability across $\tau=2$. Note that the ``00'' row is never realized in sample data.} \label{tab1} \end{table} \renewcommand{\arraystretch}{1.0} % --------------------------------------------------------------- \subsection{The eight models} \label{sec:8models} Models which allow capture probabilities to depend on one or a combination of time, heterogeneity or behavioural effects are defined using appropriate subscripts, e.g., $\calM_{th}$ depends on time and heterogeneity. These eight models have a nested structure of which $\calM_{tbh}$ is the most general. The homogeneous model $\calM_0$ is the simplest (but most unrealistic) and has equal capture probabilities for each individual $H_0: p_{ij}=p$, regardless of the sampling occasion. All eight models are GLMs, since the conditional likelihood (\ref{eq:posbern.condlikelihood}) belongs to the exponential family \citep{hugg:hwan:2011}. To illustrate the approach, we use the following toy example throughout, consider a CR experiment with two occasions---morning and evening (i.e., $\tau=2$), with capture probabilities varying between the two occasions. Furthermore, suppose we have collected some individual covariates---weight and gender. The joint probabilities of all the eight models are listed in Table \ref{tab1}. It can be seen that all but the positive-Binomial model ($\calM_{0}/\calM_{h}$) require more than one probability and hence more than one linear predictor, so that the original \cite{neld:wedd:1972} GLM framework is inadequate. Further, there are two noteworthy points from Table \ref{tab1} which apply for any $\tau\ge 2$: \begin{itemize} \item first, for $\calM_{t}$-type models, as $\tau$ increases so will the number of linear predictors and hence the potential number of parameters; \item secondly, it is evident that there are four main categories consisting of non-heterogeneity models ($\calM_{0}$, $\calM_{b}$, $\calM_{t}$ and $\calM_{tb}$), which are paired with a heterogeneity sub-model (respectively $\calM_{h}$, $\calM_{bh}$, $\calM_{th}$ and $\calM_{tbh}$). \end{itemize} The four heterogeneity models allow for each individual to have their own probability of capture/recapture. In our toy example, the capture probabilities are dependent on an individual's weight and gender. We discuss these models further in Section \ref{sec:vgam.basics}. It is natural to consider individual covariates such as weight and gender as linear/additive predictors. Let $x_{i}$ denote a covariate (either continuous or discrete) for the $i$th individual, which is constant across the capture occasions $j=1,\ldots,\tau$, e.g., for continuous covariates one could use the first observed value or the mean across all $j$. If there are $d-1$ covariates, we write $\bix_i=(x_{i1},\ldots,x_{id})^{\top}$ with $x_{i1}=1$ if there is an intercept. Also, let $g^{-1}(\eta)={\exp(\eta)}/\{{1+\exp(\eta)}\}$ be the inverse \logit{} function. Consider model $\mathcal{M}_{tbh}$, then the capture/recapture probabilities are given as [notation follows Section \ref{sec:VGAMs.basics}] \begin{eqnarray*} p_{ij}^{\dagger} & = & g^{-1} \! \left(\qquad \quad \, \beta^*_{(j+1)1} + \bix_{i[-1]}^{\top} \, \bbeta_{1[-1]}^{} \right), \qquad j=1,\ldots,\tau, \\ p_{ij} & = & g^{-1} \!\left(\beta^*_{(1)1} + \beta^*_{(j+1)1} + \bix_{i[-1]}^{\top} \,\bbeta_{1[-1]}^{} \right),\qquad j=2,\ldots,\tau, \end{eqnarray*} where $\beta^*_{(1)1}$ is the behavioural effect of prior capture, $\beta^*_{(j+1)1}$ for $j=1,\ldots,\tau$ are time effects, and $\bbeta_{1[-1]}$ are the remaining regression parameters associated with the covariates. Computationally, the conditional likelihood (\ref{eq:posbern.condlikelihood}) is maximized with respect to all the parameters (denote by $\btheta{}$) by the Fisher scoring algorithm using the derivatives given in Appendix A. % --------------------------------------------------------------- \subsection[Estimation of N]{Estimation of $N$} \label{sec:Nhat} In the above linear models, to estimate $N$ let $\pi_{i}(\btheta)=1-\prod_{s=1}^{\tau}(1-p_{is}^{\dagger})$ be the probability that individual $i$ is captured at least once in the course of the study. Then, if $\btheta$ is known, the Horvitz--Thompson \citep[HT;][]{horv:thom:1952} estimator \begin{eqnarray} \label{eq:HT} \widehat{N}(\btheta) &=& \sum_{i=1}^{n} \; {\pi}_{i}(\btheta)^{-1} \end{eqnarray} is unbiased for the population size $N$ and an associated estimate of the variance of $\widehat{N}(\btheta)$ is $s^2(\btheta) = \sum_{i=1}^{n} \; {\pi}_{i}(\btheta)^{-2} \, \left[1-{\pi}_{i}(\btheta)\right]$. If $\btheta$ is estimated by $\widehat{\btheta}$ then one can use \begin{eqnarray} \label{eq:est.varNhat2} \VAR\left(\widehat{N}(\widehat{\btheta}) \right) & \approx & s^2(\widehat{\btheta}) + \widehat{\bid}^{\top} \widehat{\VAR}(\widehat{\btheta}) \,\widehat{\bid} \end{eqnarray} where, following from a Taylor series expansion of $\widehat{N}(\widehat{\btheta})$ about $\widehat{N}(\btheta)$, \begin{eqnarray*} \bid\, = \, \frac{d N(\btheta)}{d \btheta} & = &\sum_{i=1}^n \; {\pi}_{i}(\btheta)^{-2} \; \, \frac{d {\pi}_{i}(\btheta)}{d \btheta} \\ & = &\sum_{i=1}^n \; \frac{-1}{{\pi}_{i}(\btheta)^{2}} \; \sum_{s=1}^{\tau} \; \left[\prod_{t=1,\ t \neq s}^{\tau} \left( 1 - p_{it}^{\dagger}\right)\right] \frac{\partial p_{is}^{\dagger}}{\partial \btheta}. \end{eqnarray*} %********************************************************************* \section[Vector generalized linear and additive models]{Vector generalized linear and additive models} \label{sec:vgam} To extend the above linear models, we use VGLMs and VGAMs which we briefly describe in this section. These models fit within a large statistical regression framework which will be described in \citet{yee:2015}. The details here are purposely terse; readers are directed to \citet{yee:2008,yee:2010} for accessible overviews and examples, and \citet{yee:wild:1996} and \citet{yee:hast:2003} for technical details. % --------------------------------------------------------------- \subsection[Basics]{Basics} \label{sec:vgam.basics} Consider observations on independent pairs $(\bix_i,\biy_i)$, $i=1,\ldots,n$. We use ``$[-1]$'' to delete the first element, e.g., $\bix_{i[-1]} =(x_{i2},\ldots,x_{id})^{\top}$. For simplicity, we will occasionally drop the subscript $i$ and simply write $\bix =(x_{1},\ldots,x_{d})^{\top}$. Consider a single observation where \biy{} is a $Q$-dimensional vector. For the CR models of this paper, $Q=\tau$ when the response is entered as a matrix of 0s and 1s. The only exception is for the $\calM_0/\calM_h$ where the aggregated counts may be inputted, see Section \ref{sec:M0Mh}. VGLMs are defined through the model for the conditional density \[ f(\biy | \bix ; \bB) = f(\biy,\eta_1,\ldots,\eta_M) \] for some known function $f(\cdot)$, where $\bB =(\bbeta_1 \,\bbeta_2 \,\cdots \,\bbeta_M)$ is a $d\times M$ matrix of regression coefficients to be estimated. We may also write $\bB^{\top} = (\bbeta_{(1)} \,\bbeta_{(2)}\,\cdots\, \bbeta_{(d)})$ so that $\bbeta_j$ is the $j$th column of $\bB$ and $\bbeta_{(k)}$ is the $k$th row. The $j$th linear predictor is then \begin{equation} \eta_j = \bbeta_j^{\top} \bix = \sum_{k=1}^d \beta_{(j)k} \, x_{k}, j=1, \ldots, M, \label{gammod2} \end{equation} where $\beta_{(j)k}$ is the $k$th component of $\bbeta_j$. In the CR context, we remind the reader that, as in Table \ref{tab1}, we have $M=2$ for $\calM_{bh}$, $M=\tau$ for $\calM_{th}$ and $M=2\tau-1$ for $\calM_{tbh}$. In GLMs the linear predictors are used to model the means. The $\eta_j$ of VGLMs model the parameters of a model. In general, for a parameter $\theta_j$ we take \[ \eta_j = g_j(\theta_j), j=1,\ldots,M \] and we say $g_j$ is a parameter link function. Write \begin{equation} \boldeta_i = \left(\begin{array}{c}\eta_1(\bix_{i})\\ \vdots \\ \eta_M(\bix_{i})\end{array}\right) = \bB^{\top} \bix_{i} = \left(\begin{array}{c}\bbeta_1^{\top} \bix_{i} \\\vdots \\ \bbeta_M^{\top} \bix_{i}\end{array} \right). \label{eq:lin.pred} \end{equation} In practice we may wish to constrain the effect of a covariate to be the same for some of the $\eta_j$ and to have no effect for others. In our toy example, model $\calM_{th}$ with $\tau=M=2$, $d=3$, we have \begin{eqnarray*} \eta_1(\bix_i) & = & \beta_{(1)1} + \beta_{(1)2} \, x_{i2} + \beta_{(1)3} \, x_{i3}, \\ \eta_2(\bix_i) & = & \beta_{(2)1} + \beta_{(2)2} \, x_{i2} + \beta_{(2)3} \, x_{i3}, \end{eqnarray*} which correspond to $x_{i2}$ being the individual's weight and $x_{i3}$ an indicator of gender say, then we have the constraints $\beta_{(1)2}\equiv\beta_{(2)2}$ and $\beta_{(1)3}\equiv\beta_{(2)3}$. Then, with ``${}^*$'' denoting the parameters that are estimated, \begin{eqnarray*} \eta_1(\bix_i) & = & \beta^*_{(1)1} + \beta^*_{(1)2} \, x_{i2} + \beta^*_{(1)3} \, x_{i3}, \\ \eta_2(\bix_i) & = & \beta^*_{(2)1} + \beta^*_{(1)2} \, x_{i2} + \beta^*_{(1)3} \, x_{i3}, \\ \end{eqnarray*} and we may write \begin{eqnarray*} \boldeta(\bix_i) = \begin{pmatrix}\eta_1(\bix_i)\\ \eta_2(\bix_i)\end{pmatrix} & = & \sum_{k=1}^3 \, \bbeta_{(k)} \, x_{ik}\\ & = & \begin{pmatrix}\beta_{(1)1} & \beta_{(1)2} & \beta_{(1)3}\\ \beta_{(2)1} & \beta_{(2)2} & \beta_{(2)3} \end{pmatrix} \begin{pmatrix}x_{i1}\\ x_{i2}\\ x_{i3} \end{pmatrix}\\ & = & \begin{pmatrix} \beta^*_{(1)1} & \beta^*_{(1)2} & \beta^*_{(1)3}\\ \beta^*_{(2)1} & \beta^*_{(1)2} & \beta^*_{(1)3} \end{pmatrix} \begin{pmatrix}x_{i1}\\ x_{i2}\\ x_{i3}\end{pmatrix}\\ & = & \begin{pmatrix}1 & 0\\ 0 & 1\end{pmatrix} \begin{pmatrix} \beta^*_{(1)1}\\ \beta^*_{(2)1} \end{pmatrix} x_{i1}+ \begin{pmatrix}1\\1\end{pmatrix} \beta^*_{(1)2} \, x_{i2}+ \begin{pmatrix} 1\\ 1\end{pmatrix} \beta^*_{(1)3} \, x_{i3}\\ & = & \sum_{k=1}^3 \, \bH_k \, \bbeta^*_{(k)} \, x_{ik}. \end{eqnarray*} We can also write this as (noting that $x_{i1}=1$) \begin{eqnarray*} \boldeta(\bix_i) & = & \begin{pmatrix}x_{i1} & 0 \\ 0 & x_{i1} \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} \beta^*_{(1)1}\\ \beta^*_{(2)1} \end{pmatrix} + \begin{pmatrix} x_{i2} & 0 \\ 0 & x_{i2} \end{pmatrix} \begin{pmatrix} 1 \\ 1\end{pmatrix} \beta^*_{(1)2} + \begin{pmatrix} x_{i3} & 0 \\ 0 & x_{i3} \end{pmatrix} \begin{pmatrix} 1 \\ 1 \end{pmatrix} \beta^*_{(1)3}\\ & = & \sum_{k=1}^3 \, \mathrm{diag}(x_{ik},x_{ik}) \, \bH_k \,\bbeta_{(k)}^{*}. \end{eqnarray*} In general, for VGLMs, we represent the models as \begin{eqnarray} \boldeta(\bix_i) & = & \sum_{k=1}^d \, \bbeta_{(k)} \, x_{ik} \nonumber \\ & = & \sum_{k=1}^d \, \bH_k \, \bbeta_{(k)}^{*} \, x_{ik} \label{eq:constraints.VGLM}\\ & = & \sum_{k=1}^d \, \mathrm{diag}(x_{ik},\ldots,x_{ik}) \, \bH_k \, \bbeta_{(k)}^{*} \nonumber \end{eqnarray} where $\bH_1,\bH_2,\ldots,\bH_d$ are known constraint matrices of full column-rank (i.e., rank \code{ncol}($\bH_k$)), $\bbeta_{(k)}^*$ is a vector containing a possibly reduced set of regression coefficients. Then we may write \begin{equation} \label{eq:lin.coefs4} {\bB}^{\top} = \left( \bH_1 \bbeta_{(1)}^* \; \; \; \bH_2 \bbeta_{(2)}^* \;\;\; \cdots \;\;\; \bH_d \bbeta_{(d)}^* \right) \end{equation} as an expression of (\ref{eq:lin.pred}) concentrating on columns rather than rows. Note that with no constraints at all, all $\bH_k = \bI_M$ and $\bbeta_{(k)}^*=\bbeta_{(k)}$. We need both (\ref{eq:lin.pred}) and (\ref{eq:lin.coefs4}) since we focus on the $\eta_j$ and at other times on the variables $x_{k}$. The constraint matrices for common models are pre-programmed in \pkg{VGAM} and can be set up by using arguments such as \code{parallel} and \code{zero} found in \pkg{VGAM} family functions. Alternatively, there is the argument \code{constraints} where they may be explicitly inputted. Using \code{constraints} is less convenient but provides the full generality of its power. % --------------------------------------------------------------- \subsection[Handling time-varying covariates]{Handling time-varying covariates} \label{sec:xij} Often, the covariates may be time-varying, e.g., when using temperature as a covariate, then a different value is observed and measured for each occasion $j$ for $j=1,\dots,\tau$. Again, using our toy example with $M=2$, $d=3$, and $\tau=2$, suppose we have time-dependent covariates $\bix_{ij}$, $j=1,2$. We may have the model \begin{eqnarray*} \eta_1(\bix_{i1}) & = & \beta^*_{(1)1} + \beta^*_{(1)2} \, x_{i21} + \beta^*_{(1)3}\, x_{i31},\\ \eta_2(\bix_{i2}) & = & \beta^*_{(2)1} + \beta^*_{(1)2} \, x_{i22} + \beta^*_{(1)3}\, x_{i32},\\ \end{eqnarray*} for the linear predictor on the two occasions. Here, $x_{ikt}$ is for the $i$th animal, $k$th explanatory variable and $t$th time. We write this model as \begin{eqnarray*} \boldeta(\bix_{ij}) & = & \begin{pmatrix} x_{i11} & 0\\ 0 & x_{i12} \end{pmatrix} \begin{pmatrix} 1 & 0\\ 0 & 1\end{pmatrix} \begin{pmatrix} \beta^*_{(1)1}\\ \beta^*_{(2)1} \end{pmatrix} + \begin{pmatrix} x_{i21} & 0\\ 0 & x_{i22}\end{pmatrix} \begin{pmatrix} 1\\ 1\end{pmatrix} \beta^*_{(1)2} + \begin{pmatrix} x_{i31} & 0\\ 0 & x_{i32}\end{pmatrix} \begin{pmatrix} 1 \\ 1\end{pmatrix} \beta^*_{(1)3}\\ & = & \sum_{k=1}^3 \, \mathrm{diag}(x_{ik1},x_{ik2}) \, \bH_k\,\bbeta_{(k)}^{*}. \end{eqnarray*} Thus to handle time-varying covariates one needs the \code{xij} facility of \pkg{VGAM} (e.g., see Section \ref{sec:poz:posbernoulli.eg.hugg:1991}), which allows a covariate to have different values for different $\eta_{j}$ through the general formula \begin{eqnarray} \boldeta(\bix_{ij}) & = & \sum_{k=1}^{d}\, \mathrm{diag}(x_{ik1},\ldots,x_{ikM})\, \bH_k \,\bbeta_{(k)}^{*}=\sum_{k=1}^d \, \bX^{\#}_{(ik)}\bH_k \,\bbeta_{(k)}^{*} \label{eq:vglimo:xij.vector.diag} \end{eqnarray} where $x_{ikj}$ is the value of variable $x_{k}$ for unit $i$ for $\eta_{j}$. The derivation of (\ref{eq:vglimo:xij.vector.diag}), followed by some examples are given in \cite{yee:2010}. Implementing this model requires specification of the diagonal elements of the matrices $\bX^*_{ik}$ and we see its use in Section \ref{sec:poz:posbernoulli.eg.hugg:1991}. Clearly, a model may include a mix of time-dependent and time-independent covariates. The model is then specified through the constraint matrices $\bH_k$ and the covariate matrices $\bX^{\#}_{(ik)}$. Typically in CR experiments, the time-varying covariates will be environmental effects. Fitting time-varying individual covariates requires some interpolation when an individual is not captured and is beyond the scope of the present work. % --------------------------------------------------------------- \subsection[VGAMs]{VGAMs} \label{sec:VGAMs.basics} VGAMs replace the linear functions in (\ref{eq:constraints.VGLM}) by smoothers such as splines. Hence, the central formula is \begin{equation} \boldeta_i = \sum_{k=1}^d \; \bH_k \, \bif_k^*(x_{ik}) \label{eq:vgam} \end{equation} where $\bif_k^*(x_k) = (f_{k(1)}^*(x_k),\ldots,f_{k(M_k)}^*(x_k))^{\top}$ is a vector of $M_k$ smooth functions of $x_k$, where $M_k=\mathtt{ncol}(\bH_k)$ is the rank of the constraint matrix for $x_k$. Note that standard error bands are available upon plotting the estimated component functions (details at \cite{yee:wild:1996}), e.g., see Figure \ref{fig:poz:deermice}. %********************************************************************* \section[VGLMs and VGAMs applied to CR data]{VGLMs and VGAMs applied to CR data} \label{sec:meth} In this section we merge the results of Sections \ref{sec:cr} and \ref{sec:vgam} to show how the eight models of \citet{otis:etal:1978} can be fitted naturally within the VGLM/VGAM framework. % --------------------------------------------------------------- \subsection[Linear predictors and constraint matrices]{Linear predictors and constraint matrices} \label{sec:constraints} As in Section \ref{sec:vgam.basics}, we now write $\biy_i$ as the capture history vector for individual $i$. Written technically, $\biy_i \in (\{0,1\})^{\tau} \backslash\{\bzero_\tau\}$ so that there is at least one 1 (capture). For simplicity let $p_c$ and $p_r$ be the capture and recapture probabilities. Recall that the value for $M$ will depend on the CR model type and the number of capture occasions considered in the experiment, for example, consider model $\calM_b$ as in Table \ref{tab1}, then $(\eta_1,\eta_2)=(g(p_c),g(p_r))$ for some link function $g$, thus $M=2$. The upper half of Table \ref{tab2} gives these for the eight \citet{otis:etal:1978} models. The lower half of Table \ref{tab2} gives the names of the \pkg{VGAM} family function that fits those models. They work very similarly to the \code{family} argument of \code{glm()}, e.g., <