\documentclass{article}[11pt] \usepackage{Sweave} \usepackage{amsmath} \addtolength{\textwidth}{1in} \addtolength{\oddsidemargin}{-.5in} \setlength{\evensidemargin}{\oddsidemargin} \SweaveOpts{keep.source=TRUE, fig=FALSE} % Ross Ihaka suggestions \DefineVerbatimEnvironment{Sinput}{Verbatim} {xleftmargin=2em} \DefineVerbatimEnvironment{Soutput}{Verbatim}{xleftmargin=2em} \DefineVerbatimEnvironment{Scode}{Verbatim}{xleftmargin=2em} \fvset{listparameters={\setlength{\topsep}{0pt}}} \renewenvironment{Schunk}{\vspace{\topsep}}{\vspace{\topsep}} \SweaveOpts{prefix.string=adjcurve,width=6,height=4} \setkeys{Gin}{width=\textwidth} %\VignetteIndexEntry{Validation} <>= options(continue=" ", width=60) options(SweaveHooks=list(fig=function() par(mar=c(4.1, 4.1, .3, 1.1)))) pdf.options(pointsize=8) #text in graph about the same as regular text require(survival, quietly=TRUE) @ \newcommand{\imat}{H} % use H for the hessian rather than script I \newcommand{\Cvar}{H^{-1}} % use H for the hessian rather than script I \newcommand{\splus}{R} % ``the survival package'' would be an alternate \newcommand{\xbar}{\overline x} \newcommand{\lhat}{\hat \Lambda} \def\bhat{\hat\beta} %define "bhat" to mean "beta hat" \def\Mhat{\widehat M} %define "Mhat" to mean M-hat \newcommand{\code}[1]{\texttt{#1}} \title{Validation} \author{Terry M Therneau} \date{Dec 2015} \begin{document} \maketitle \section{Introduction} \begin{quotation} `When I use a word,' Humpty Dumpty said, in rather a scornful tone, `it means just what I choose it to mean - neither more nor less.' `The question is,' said Alice, `whether you can make words mean so many different things.' `The question is,' said Humpty Dumpty, `which is to be master - that's all.' -Lewis Caroll, \emph{Through the Looking Glass} \end{quotation} ``Validatation'' is a label which is used for many different things in scientific research, so much so that the word is essentially meaningless without further clarification. One of the more common meanings assigned to it in the software realm is ``repeatability'', i.e. that a new release of a given package or routine will give the same results as it did the week before. Users of the software often assume the word implies a more rigorous criterion, namely that the routine gives \emph{correct} answers. Validation of the latter type is rare, however; and I surmise that a primary reason for this is that working out correct answers is boring, tedious work. This note contains a set of examples, often very simple, that have evolved over multiple decades. They have proven extremely useful in debugging the methods, not least because all the intermediate steps of each calculation are transparent, and have been incorporated into the formal test suite for the survival package as the files \code{book1.R}, \code{book2.R}, etc. They also continue to be a resource for package's defence: I have been told multiple times that some person or group cannot use R in their work because ``SAS is validated'' while R is not. The survival package passes all of the tests below and SAS passes many but not all of them. It is my hope that the formal test cases will be a resource for developers on multiple platforms. Portions of this work were included as an appendix in the textbook of Therneau and Grambsch \cite{Therneau2000} precisely for this reason. \section{Basic formulas} All these examples have a single covariate. Let $x_i$ be the covariate for each subject, $r_i= exp(x_i \beta)$ the risk score for the subject, and $w_i$ the case weight, if any. Let $Y_i(t)$ be 1 if subject $i$ is at risk at time $t$ and 0 otherwise, and $\delta_i(t)$ be the death indicator which is 1 if subject $i$ has an event at time $t$. At each death time we have the following quantities: \begin{align} d(t) &= \sum_i Y_i(t) w_i r_i \\ LPL(t) &= \left( \sum_i \delta_i(t) \log(r_i) \right) / \log(d(t)) \\ \xbar(t) &= \left( \sum_i Y_i(t) w_i r_i x_i \right) /d(t) \\ U(t) &= \sum_i \delta_i(t) (x_i - \xbar(t)) \label{U1} \\ H(t) &= \sum_i Y_i(t) (x_i - \xbar(t))^2 / d(t) \\ &= \left(\sum_i Y_i(t) x_i^2/d(t) \right)- \xbar(t)^2 \label{H2} \\ \lambda(t) &= \sum_i \delta_i(t)/ d(t) \end{align} The denominator $d$ is the weighted number of subjects at the time, $LPL$ is the contribution to the log partial likelihood at time $t$, and $\xbar$ and $H$ are the weighted mean and variance of the covariate $x$ at each time. The sum of $H(t)$ over the death times is the second derivative of the LPL, also known as the Hessian matrix. $U$ is the contribution to the first derivative of the LPL at time $t$ and $\lambda$ is the increment in the baseline hazard function. \section{Test data 1} This data set of $n=6$ subjects has a single 0/1 covariate $x$. There is one tied death time, one time with both a death and a censored observation, one with only a death, and one with only censoring. (This is as small as a data set can be and still cover these four important cases.) Let $r = \exp(\beta)$ be the risk score for a subject with $x=1$; the risk score is $\exp(0) =1$ for those with $x=0$. Table \ref{tab:val1} shows the data set along with the mean and increment to the hazard at each time point. \begin{table}[b] \centering \begin{tabular}{ccc|cc|cc} &&& \multicolumn{2}{c|}{$\xbar(t)$} & \multicolumn{2}{c}{$d\lhat_0(t)$} \\ Time& Status& $x$& Breslow& Efron& Breslow& Efron \\ \hline 1&1&1&$r/(r+1)$ & $r/(r+1)$& $1/(3r+3)$ & $1/(3r+3)$ \\ 1&0&1&&&&\\ 6&1&1& $r/(r+3)$ & $r/(r+3)$ & $1/(r+3)$& $1/(r+3)$ \\ 6&1&0& $r/(r+3)$ & $r/(r+5)$ & $1/(r+3)$& $1/(r+5)$ \\ 8&0&0&&&& \\ 9&1&0&0&0& 1& 1\\ \end{tabular} \caption{Test data 1} \label{tab:val1} \end{table} \subsection{Breslow estimates} \label{sect:valbreslow} The log partial likelihood (LPL) has a term for each event; each term is the log of the ratio of the score for the subject who had an event over the sum of scores for those who did not. The LPL, first derivative $U$ of the LPL and second derivative (or Hession) $\imat$ are: \begin{eqnarray*} LPL &=& \{\beta- \log(3r+3)\} + \{\beta - \log(r+3)\} + \{0-\log(r+3)\} + \{0-0\} \\ &=& 2\beta - \log(3r+3) - 2\log(r+3). \\ \\ U &=& \left(1-\frac{r}{r+1}\right) + \left(1-\frac{r}{r+3}\right) + \left(0-\frac{r}{r+3} \right) + (0-0) \\ &=& \frac{-r^2 + 3r + 6}{(r+1)(r+3)} .\\ \\ -\imat&=& \left\{\frac{r}{r+1} - \left( \frac{r}{r+1} \right )^2\right\} +2 \left\{\frac{r}{r+3} - \left( \frac{r}{r+3} \right )^2\right\} + (0-0) \\ &=& \frac{r}{(r+1)^2} + \frac{6r}{(r+3)^2}. \end{eqnarray*} (For a 0/1 covariate the variance formula \eqref{H2} simplifies to $\xbar - \xbar^2$, but only in that case. We used this fact above.) The following function computes these quantities. <>= breslow1 <- function(beta) { # first test data set, Breslow approximation r = exp(beta) lpl = 2*beta - (log(3*r +3) + 2*log(r+3)) U = (6+ 3*r - r^2)/((r+1)*(r+3)) H = r/(r+1)^2 + 6*r/(r+3)^2 c(beta=beta, loglik=lpl, U=U, H=H) } beta <- log((3 + sqrt(33))/2) temp <- rbind(breslow1(0), breslow1(beta)) dimnames(temp)[[1]] <- c("beta=0", "beta=solution") temp @ The maximum partial likelihood occurs when $U(\beta)=0$, namely $r^2 -3r -6 =0$. Using the usual formula for a quadratic equation gives $r=(1/2)(3 + \sqrt{33})$ and $\bhat = \log(r) \approx 1.475285$. The above call to \code{breslow1} verifies that the first derivative is zero at this point. Newton--Raphson iteration has increments of $-\Cvar U$. Starting with the usual initial estimate of $\beta=0$, the first iteration is $\beta=8/5$ and further ones are shown below. <<>>= iter <- matrix(0, nrow=6, ncol=4, dimnames=list(paste("iter", 0:5), c("beta", "loglik", "U", "H"))) # Exact Newton-Raphson beta <- 0 for (i in 1:6) { iter[i,] <- breslow1(beta) beta <- beta + iter[i,"U"]/iter[i,"H"] } iter # coxph fits test1 <- data.frame(time= c(1, 1, 6, 6, 8, 9), status=c(1, 0, 1, 1, 0, 1), x= c(1, 1, 1, 0, 0, 0)) temp <- matrix(0, nrow=6, ncol=4, dimnames=list(1:6, c("iter", "beta", "loglik", "H"))) for (i in 0:5) { tfit <- coxph(Surv(time, status) ~ x, data=test1, ties="breslow", iter.max=i) temp[i+1,] <- c(tfit$iter, coef(tfit), tfit$loglik[2], 1/vcov(tfit)) } temp @ The \code{coxph} routine declares convergence after 4 iterations for this data set, so the last two calls with \code{iter.max} of 4 and 5 give identical results. The martingale residuals are defined as $O-E$ = observed - expected, where the observed is the number of events for the subject (0 or 1) and $E$ is the expected number assuming that the model is completely correct. For the first death all 6 subjects are at risk, and the martingale formulation views the outcome as a lottery in which the subjects hold $r$, $r$, $r$, 1, 1 and 1 tickets, respectively. The contribution to $E$ for subject 1 at time 1 is thus $r/(r+3)$. Carrying this forward the residuals can be written as simple function of the cumulative baseline hazard $\Lambda_0(t)$, the Nelson cumulative hazard estimator with case weights of $w_ir_i$; this is shown in the `Breslow' column of table \ref{tab:val1}. (Also known as the Aalen estimate, Breslow estimate, and all possible combinations of the three names.) Then the residual can be written as \begin{equation} M_i = \delta_i - \exp(x_i\beta)\lhat(t_i) \label{mart} \end{equation} Each of the two subjects who die at time 6 are credited with the full hazard increment at time 6. Residuals at $\beta=0$ and $\bhat$ are shown in the table below. <>= mresid1 <- function(r) { status <- c(1,0,1,1,0,1) xbeta <- c(r,r,r,1,1,1) temp1 <- 1/(3*r +3) temp2 <- 2/(r+3) + temp1 status - xbeta*c(temp1, temp1, temp2, temp2, temp2, 1+ temp2) } r0 <- mresid1(1) r1 <- round(mresid1((3 + sqrt(33))/2), 6) @ \begin{center} \begin{tabular}{c|lrr} Subject &\multicolumn{1}{c}{$\Lambda_0$} & $\Mhat(0)$ & $\Mhat(\bhat)$ \\ \hline 1& $1/(3r+3)$ & $5/6$ & \Sexpr{r1[1]} \\ 2& $1/(3r+3)$ & $-1/6$ & \Sexpr{r1[2]} \\ 3& $1/(3r+3) + 2/(r+3)$ &$1/3$ & \Sexpr{r1[3]} \\ 4& $1/(3r+3) + 2/(r+3)$ &$1/3$ & \Sexpr{r1[4]} \\ 5& $1/(3r+3) + 2/(r+3)$ & $-2/3$ & \Sexpr{r1[5]} \\ 6 & $1/(3r+3) + 2/(r+3) +1$ & $-2/3$ & \Sexpr{r1[6]} \end{tabular} \end{center} The score statistic $U$ can be written as a two way sum involving the covariate(s) and the martingale residuals \begin{equation} U = \sum_{i=1}^n \int [x_i - \xbar(t)] dM_i(t) \label{score} \end{equation} The martingale residual $M$ has jumps at the observed deaths, leading to the table below with 6 rows and 3 columns. The score residuals $L_i$ are defined as the per-patient contributions to this total, i.e., the row sums, and the Schoenfeld residuals are the per-time point contributions, i.e., the column sums. \begin{center} \begin{tabular}{cccc} &\multicolumn{3}{c}{Time} \\ Subject & 1 & 6 & 9 \\ \hline 1&$ \left(1- \frac{r}{r+1} \right) \left(1- \frac{r}{3r+3} \right)$ &0&0 \\ 2& $\left(1- \frac{r}{r+1} \right) \left(0- \frac{r}{3r+3} \right)$ &0&0 \\ 3& $\left(1- \frac{r}{r+1} \right) \left(0- \frac{r}{3r+3} \right)$ & $\left(1- \frac{r}{r+3} \right) \left(1- \frac{2r}{r+3} \right)$ & 0\\ 4& $\left(0- \frac{r}{r+1} \right) \left(0- \frac{1}{3r+3} \right)$ & $\left(0- \frac{r}{r+3} \right) \left(1- \frac{2}{r+3} \right)$ & 0\\ 5& $\left(0- \frac{r}{r+1} \right) \left(0- \frac{1}{3r+3} \right)$ & $\left(0- \frac{r}{r+3} \right) \left(0- \frac{2}{r+3} \right)$ & 0\\ 6& $\left(0- \frac{r}{r+1} \right) \left(0- \frac{1}{3r+3} \right)$ & $\left(0- \frac{r}{r+3} \right) \left(0- \frac{2}{r+3} \right)$ & (0 - 0) (1-1) \end{tabular} \end{center} At $\beta=0$ the score residuals are 5/12, -1/12, 7/24, -1/24, 5/24 and 5/24. Showing that the three column sums are identical to the three terms of equation \eqref{U1} is left as an exercise for the reader, namely $1- \xbar(1)$, $(1- \xbar(6)) + (0- \xbar(6))$ and $1 - \xbar(9)$. The computer program returns 4 residuals, one per event, rather than one per death time as this has proven to be more useful for plots and other downstream computations. In the multivariate case there will be a matrix like the above for each covariate. Let $L$ be the $n$ by $p$ matrix made up of the collection of row sums where $n$ is the number of subjects and $p$ is the number of covariates, this is the matrix of score residuals. The dfbeta residuals are the $n$ by $p$ matrix $D = L \Cvar$; $\imat$ has been defined above for this data set. $D$ is an approximate measure of the influence of each observation on the solution vector. Similarly, the scaled Schoenfeld residuals are the (number of events) by $p$ matrix obtained by multiplying the Schoenfeld residuals by $\Cvar$. As stated above there is a close connection between the Nelson--Aalen estimate estimate of cumulative hazard and the Breslow approximation for ties. The baseline hazard is shown as the column $\Lambda_0$ in table \ref{tab:val1}. The estimated hazard for a subject with covariate $x_i$ is $\Lambda_i(t) = \exp(x_i \beta) \Lambda_0(t)$ and the survival estimate for the subject is $S_i(t)= \exp(-\Lambda_i(t))$. The variance of the cumulative hazard is the sum of two terms. Term 1 is a natural extension of the Nelson--Aalen estimator to the case where there are weights. It is a running sum, with an increment at each death of $1/(\sum Y_i(t)r_i(t))^2$. For a subject with covariate $x_i$ this term is multiplied by $[\exp(x_i \beta)]^2$. The second term is $c\Cvar c'$, where $\imat$ is the information matrix of the Cox model and $c$ is a vector. The second term accounts for the fact that the weights themselves have a variance; $c$ is the derivative of $S(t)$ with respect to $\beta$ and can be formally written as $$ \exp(x\beta) \int_0^t (\bar x(s) - x_i) d\hat\Lambda_0 (s)\,. $$ This can be recognized as $-1$ times the score residual process for a subject with $x_i$ as covariates and no events; it measures leverage of a particular observation on the estimate of $\beta$. It is intuitive that a small score residual --- an observation whose covariates has little influence on $\beta$ --- results in a small added variance; that is, $\beta$ has little influence on the estimated survival. \begin{center} \begin{tabular}{c|l} Time & Term 1 \\ \hline 1& $1/(3r+3)^2$ \\ 6& $1/(3r+3)^2 + 2/(r+3)^2$ \\ 9& $1/(3r+3)^2 + 2/(r+3)^2 + 1/1^2$ \\ \multicolumn{2}{c}{} \\ Time & $c$ \\ \hline 1& $(r/(r+1))* 1/(3r+3)$ \\ 6& $(r/(r+1))* 1/(3r+3) + (r/(r+3))* 2/(r+3)$ \\ 9& $(r/(r+1))* 1/(3r+3) + (r/(r+3))* 2/(r+3) + 0*1$\\ \end{tabular} \end{center} For $\beta=0$, $x=0$: \begin{center} \begin{tabular}{c|lll} Time & \multicolumn{2}{c}{Variance}&\\ \hline 1 & 1/36 &+ $1.6*(1/12)^2 $ &= 7/180\\ 6 & (1/36 + 2/16) &+ $1.6*(1/12 + 2/16)^2 $ &= 2/9\\ 9 & (1/36 + 2/16 + 1)&+ $1.6*(1/12 + 2/16 + 0)^2$&= 11/9\\ \end{tabular} \end{center} For $\beta=1.4752849$, $ x=0$ \begin{center} \begin{tabular}{c|lll} Time & \multicolumn{2}{c}{Variance}&\\ \hline 1 & 0.0038498 &+ .004021 &= 0.007871\\ 2 & 0.040648 &+ .0704631 &= 0.111111\\ 4 & 1.040648 &+ .0704631 &= 1.111111\\ \end{tabular} \end{center} \subsection{Efron approximation} The Efron approximation \cite{Efron77} differs from the Breslow only at day 6, where two deaths occur. A useful way to view the approximation is to recast the problem as a lottery model. On day 1 there were 6 subjects in the lottery and 1 ticket was drawn, at which time the winner became inelegible for further drawings and withdrew. On day 6 there were 4 subjects in the drawing (at risk) and two tickets (deaths) were drawn. The Breslow approximation considers all four subjects to be eligible for both drawings, which implies that one of them could in theory have won both, that is, died twice. This is of clearly impossible. The Efron approximation treats the two drawings on day 6 as sequential. All four living subjects are at risk for the first of them, then the winner is withdrawn. Three subjects are elibible for the second drawing, either subjects 3, 5, and 6 or subjects 2, 5, and 6, but we do not know which. In some sense then, subjects 3 and 4 each have ``.5 probability" of being at risk for the second event at time 6. In the computation, we treat the two deaths at time 6 as two separate times (two terms in the loglik), with subjects 3 and 4 each having a case weight of 1/2 for the second one. The mean covariate for the second event is then $$ \frac{1*r/2 + 0*1/2 + 0*1 + 0*1 } {r/2 + 1/2 + 1+1} = \frac{r}{r+5} $$ and the main quantities are \begin{eqnarray*} LL &=& \{\beta- \log(3r+3)\} + \{\beta - \log(r+3)\} + \{0-\log(r/2 +5/2)\} + \{0-0\} \\ &=& 2\beta - \log(3r+3) - \log(r+3) - \log(r/2 +5/2)\\ \\ U &=& \left(1-\frac{r}{r+1}\right) + \left(1-\frac{r}{r+3}\right) + \left(0-\frac{r}{r+5}\right) + (0-0) \\ &=& \frac{-r^3 + 23r + 30}{(r+1)(r+3)(r+5)} \\ \\ I&=& \left\{\frac{r}{r+1} - \left( \frac{r}{r+1} \right )^2\right\} + \left\{\frac{r}{r+3} - \left( \frac{r}{r+3} \right )^2\right\} \\ && + \left\{\frac{r}{r+5} - \left( \frac{r}{r+5} \right )^2\right\}. \end{eqnarray*} The solution corresponds to the one positive root of $U(\beta)=0$, which is $r=2\sqrt{23/3}\cos(\phi/3)$ where $\phi=\arccos\{(45/23)\sqrt{3/23}\}$ via the standard formula for the roots of a cubic equation. This yields $r \approx 5.348721$ or $\bhat = \log(r) \approx 1.676857$. Plugging this value into the formulas above yields $$ \begin{array}{ll} LL(0)=-4.276666 & LL(\bhat)=-3.358979 \\ U(0) = 52/48 & U(\bhat) = 0 \\ \imat(0) = 83/144 & \imat(\bhat) = 0.652077. \end{array} $$ The martingale residuals are again $O-E$, but the expected part of the calculation changes. For the first drawing at time 6 the total number of ``tickets'' in the drawing is $r+1+1+1$; subject 4 has an increment of $r/(r+3)$ and the others $1/(r+3)$ to their expected value. For the second event at time 6 subjects 3 and 4 have a weight of 1/2, the total number of tickets is $(r+5)/2$ and the consequent increment in the cumulative hazard is $2/(r+5)$. This $\beta=0$ this calculation is equivalent to the Fleming-Harrington \cite{Fleming84} estimate of cumulative hazard. Subjects 3 and 4 recieve 1/2 of this second increment to $E$ and subjects 5 and 6 the full increment. Efron \cite{Efron77} did not discuss residuals so did not investigate this aspect of the approximation, we neverthess sometime refer to this using combinations of Fleming, Harrington, Efron in the same way as the Nelson-Aalen-Breslow estimate. The martingale residuals are \begin{center} \begin{tabular}{cl} Subject & $M_i$ \\ \hline 1 & $1 - r/(3r+3)$ \\ 2 & $0 - r/(3r+3)$ \\ 3 & $1 - r/(3r+3) - r/(r+3) - r/(r+5)$ \\ 4 & $1 - 1/(3r+3) - 1/(r+3) - 1/(r+5)$ \\ 5 & $0 - 1/(3r+3) - 1/(r+3) - 2/(r+5)$ \\ 6 & $0 - 1/(3r+3) - 1/(r+3) - 2/(r+5) - 1$ \\ \end{tabular} \end{center} giving residuals at $\beta=0$ of 5/6, -1/6, 5/12, 5/12, -3/4 and -3/4. The matrix defining the score and Schoenfeld residuals has the same first column (time 1) and last column as before, with the following contributions at time 6. \begin{center} \begin{tabular}{ccc} &\multicolumn{2}{c}{Time} \\ Subject & 6 (first) & 6 (second)\\ \hline 1&0 & 0 \\ 2&0 & 0 \\ 3& $\left(1- \frac{r}{r+3} \right) \left(1- \frac{r}{r+3} \right)$ & $\left(1- \frac{r}{r+5} \right) \left(1- \frac{2r}{r+5} \right)/2$ \\ 4& $\left(0- \frac{r}{r+3} \right) \left(0- \frac{1}{r+3} \right)$ & $\left(0- \frac{r}{r+5} \right) \left(1- \frac{2}{r+5} \right)/2$ \\ 5& $\left(0- \frac{r}{r+3} \right) \left(0- \frac{1}{r+3} \right)$ & $\left(0- \frac{r}{r+5} \right) \left(0- \frac{2}{r+5} \right)$ \\ 6& $\left(0- \frac{r}{r+3} \right) \left(0- \frac{1}{r+3} \right)$ & $\left(0- \frac{r}{r+5} \right) \left(0- \frac{2}{r+5} \right)$ \end{tabular} \end{center} The score residuals at $\beta=0$ are 5/12, -1/12, 55/144, -5/144, 29/144 and 29/144. It is an error to generate residuals for the Efron method by using formula \eqref{mart}, which was derived from the Breslow approximation. It is clear that some packages do exactly this, however, which can be verified using formulas from above. (Statistical forensics is another use for our results.) What are the conseqences of this? On a formal level the resulting ``martingal residuals'' no longer have an expected value of 0 and thus are not martingales, so one loses theoretical backing for derived plots or statistics. The score, Schoenfeld, dfbeta and scaled Schoenfeld residuals are based on the martingale residual so suffer the same loss. On a practical level, when the fraction of ties is small it is quite often the case that $\bhat$ is nearly the same when using the Breslow and Efron approach. We have normally found the correct and ad hoc residuals to be similar as well in that case, sufficiently so that explorations of functional form (martingale residuals), leverage and robust variance (dfbeta) and proportional hazards (scaled Schoenfeld) led to the same conclusions. This will not hold when there are a moderate to large number of ties. The variance formula for the baseline hazard function in the Efron case is evaluated the same way as before, as the sum of $(\mbox{hazard increment})^2$, treating a tied death as multiple separate hazard increments. In term 1 of the variance, the variance increment at time 6 is now $1/(r+3)^2 + 4/(r+5)^2$ rather than $2/(r+3)^2$. The increment to $d$ at time 6 is $(r/(r+3))* 1/(r+3) + (r/(r+5))* 2/(r+5)$. (Numerically, the result of this computation is intermediate between the Nelson--Aalen variance and the Greenwood variance used in the Kaplan--Meier.) For $\beta=0$, $x=0$, let $v= \Cvar = 144/83$. \begin{center} \begin{tabular}{c|ll} Time & \multicolumn{2}{c}{Variance}\\ \hline 1 & 1/36 \\ & \quad + $v(1/12)^2 $ &= \phantom{0}119/2988\\ 6 & (1/36 + 1/16 + 4/25)& \\ &\quad + $v(1/12 + 1/16+ 1/18)^2$ &= 1996/6225\\ 9 & (1/36 + 1/16 + 4/25 + 1) \\& \quad + $v(1/12 + 1/16 + 1/18 +0)^2$&= 8221/6225\\ \end{tabular} \end{center} For $\beta=1.676857$, $ x=0$. \begin{center} \begin{tabular}{c|ll} Time & \multicolumn{2}{c}{Variance}\\ \hline 1 & 0.00275667 + .00319386 & = 0.0059505\\ 2 & 0.05445330 + .0796212 & = 0.134075\\ 4 & 1.05445330 + .0796212 &= 1.134075\\ \end{tabular} \end{center} \subsection{Exact partial likelihood} Returning to the lottery analogy, for the two deaths at time 6 the exact partial likelihood computes the direct probability that those two subjects would be selected given that a pair will be chosen. The numerator is $r_3 r_4$, the product of the risk scores of the subjects with an event, and the denominator is the sum over all 6 pairs who could have been chosen: $r_3r_4 + r_3r_5 + r_3r_6 + r_4 r_5 + r_4r_6 + r_5r_6$. (If there were 10 tied deaths from a pool of 60 available the sum will have over 75 billion terms, each a product of 10 values; a truly formidable computation!) In our case, three of the four subjects at risk at time 6 have a risk score of $\exp(0x)=1$ and one a risk score of $r$, and the denominator is $r +r+ r+1 +1 +1$. \begin{eqnarray*} LL &=& \{\beta- \log(3r+3)\} + \{\beta - \log(3r+3)\} + \{0-0\} \\ &=& 2\{\beta - \log(3r+3)\}. \\ \\ U &=& \left(1-\frac{r}{r+1}\right) + \left(1-\frac{r}{r+1}\right) + (0-0)\\ &=& \frac{2}{r+1}. \\ \\ -\imat&=& \frac{2r}{(r+1)^2}. \\ \end{eqnarray*} The solution $U(\beta)=0$ corresponds to $r=\infty$, with a loglikelihood that asymptotes to $-2\log(3)$ = 2.1972. The Newton--Raphson iteration has increments of $(r+1)/r$ leading to the following iteration for $\bhat$: <>= temp <- matrix(0, 8, 3) dimnames(temp) <- list(paste0("iteration ", 0:7, ':'), c("beta", "loglik", "H")) bhat <- 0 for (i in 1:8) { r <- exp(bhat) temp[i,] <- c(bhat, 2*(bhat - log(3*r +3)), 2*r/(r+1)^2) bhat <- bhat + (r+1)/r } round(temp,3) @ The Newton-Raphson iteration quickly settles down to addition of a constant increment to $\bhat$ at each step while the partial likelihood approaches an asymptote: this is a fairly common case when the Cox MLE is infinite. A solution at $\bhat=10$ or 15 is hardly different in likelihood from the true maximum, and most programs will stop iterating around this point. The information matrix, which measures the curvature of the likelihood function at $\beta$, rapidly goes to zero as $\beta$ grows. It is difficult to describe a satisfactory definition of the expected number of events for each subject and thus a definition of the proper martingale residual for the exact calculation. Among other things it should lead to a consistent score residual, i.e., ones that sum to the total score statistic $U$ \begin{align*} L_i &= \int (x_i - \xbar(t)) dM_i(t) \\ \sum L_i &= U \end{align*} The residuals defined above for the Breslow and Efron approximations have this property, for instance. The exact partial likelihood contribution to $U$ for a set of set of $k$ tied deaths, however, is a sum of all subsets of size $k$; how would one partition this term as a simple sum over subjects? The exact partial likelihood is infrequently used and examination of post fit residuals is even rarer. The survival package (and all others that I know of) takes the easy road in this case and uses equation \eqref{mart} along with the Nelson-Aalen-Breslow hazard to form residuals. They are certainly not correct, but the viable options were to use this, the Efron residuals, or print an error message. At $\bhat=\infty$ the Breslow residuals are still well defined. Subjects 1 to 3, those with a covariate of 1, experience a hazard of $r/(3r+3)=1/3$ at time 1. Subject 3 accumulates a hazard of 1/3 at time 1 and a further hazard of 2 at time 6. The remaining subjects are at an infinitely lower risk during days 1 to 6 and accumulate no hazard then, with subject 6 being credited with 1 unit of hazard at the last event. The residuals are thus $1-1/3=2/3$, $0-1/3$, $1-7/3= -4/3$, $1-0$, 0, and 0, respectively, for the six subjects. \section{Test data 2} This data set also has a single covariate, but in this case a (start, stop] style of input is employed. Table \ref{tab:val2} shows the data sorted by the end time of the risk intervals. The columns for $\xbar$ and hazard are the values at the event times; events occur at the end of each interval for which status = 1. \begin{table}\centering \begin{tabular}{ccc|ccc} &&&Number& \\ Time&Status&$x$&at Risk& $\xbar$& $d\lhat$ \\ \hline (1,2] &1 &1 &2 &$r/(r+1)$ & $1/(r+1)$\\ (2,3] &1 &0 &3 & $r/(r+2)$ & $1/(r+2)$ \\ (5,6] &1 &0 &5 & $3r/(3r+2)$ & $1/(3r+2)$ \\ (2,7] &1 &1 &4 & $3r/(3r+1)$ & $1/(3r+1)$ \\ (1,8] &1 &0 &4 & $3r/(3r+1)$ & $1/(3r+1)$ \\ (7,9] &1 &1 &5 & $3r/(3r+2)$ & $2/(3r+2)$ \\ (3,9] &1 &1 && \\ (4,9] &0 &1 && \\ (8,14]&0 &0 &2& 0&0 \\ (8,17]&0 &0 &1& 0&0 \\ \end{tabular} \caption{Test data 2} \label{tab:val2} \end{table} \subsection{Breslow approximation} For the Breslow approximation we have \begin{eqnarray*} LL &=& \log\left(\frac{r}{r+1}\right) +\log\left(\frac{1}{r+2}\right) +\log\left(\frac{1}{3r+2}\right) +\\ && \log\left(\frac{r}{3r+1}\right) +\log\left(\frac{1}{3r+1}\right) +2\log\left(\frac{r}{3r+2}\right) \\ &=& 4\beta - \log(r+1) - \log(r+3)- 3\log(3r+2) -2\log(3r+1). \\ \\ U &=& \left(1-\frac{r}{r+1}\right) + \left(0-\frac{r}{r+2}\right) + \left(0-\frac{3r}{3r+2}\right) + \\ && \left(1-\frac{3r}{3r+1}\right) + \left(0-\frac{3r}{3r+1}\right) + 2\left(1-\frac{3r}{3r+2}\right) \\ \\ \\ \imat &=& \frac{r}{(r+1)^2} + \frac{2r}{(r+2)^2} + \frac{6r}{(3r+2)^2} + \frac{3r}{(3r+1)^2} \\ && \frac{3r}{(3r+1)^2} + \frac{12r}{(3r+2)^2} . \\ \end{eqnarray*} In this case $U$ is a quartic equation and we find the solution numerically. <>= ufun <- function(r) { 4 - (r/(r+1) + r/(r+2) + 3*r/(3*r+2) + 6*r/(3*r+1) + 6*r/(3*r+2)) } rhat <- uniroot(ufun, c(.5, 1.5), tol=1e-8)$root bhat <- log(rhat) c(rhat=rhat, bhat=bhat) @ The solution is at $U(\bhat)=0$ or $r \approx .9189477$; $\bhat = \log(r) \approx -.084526$. Then <>= true2 <- function(beta, newx=0) { r <- exp(beta) loglik <- 4*beta - log(r+1) - log(r+2) - 3*log(3*r+2) - 2*log(3*r+1) u <- 1/(r+1) + 1/(3*r+1) + 4/(3*r+2) - ( r/(r+2) +3*r/(3*r+2) + 3*r/(3*r+1)) imat <- r/(r+1)^2 + 2*r/(r+2)^2 + 6*r/(3*r+2)^2 + 3*r/(3*r+1)^2 + 3*r/(3*r+1)^2 + 12*r/(3*r+2)^2 hazard <-c( 1/(r+1), 1/(r+2), 1/(3*r+2), 1/(3*r+1), 1/(3*r+1), 2/(3*r+2) ) xbar <- c(r/(r+1), r/(r+2), 3*r/(3*r+2), 3*r/(3*r+1), 3*r/(3*r+1), 3*r/(3*r+2)) # The matrix of weights, one row per obs, one col per time # deaths at 2,3,6,7,8,9 wtmat <- matrix(c(1,0,0,0,1,0,0,0,0,0, 0,1,0,1,1,0,0,0,0,0, 0,0,1,1,1,0,1,1,0,0, 0,0,0,1,1,0,1,1,0,0, 0,0,0,0,1,1,1,1,0,0, 0,0,0,0,0,1,1,1,1,1), ncol=6) wtmat <- diag(c(r,1,1,r,1,r,r,r,1,1)) %*% wtmat x <- c(1,0,0,1,0,1,1,1,0,0) status <- c(1,1,1,1,1,1,1,0,0,0) xbar <- colSums(wtmat*x)/ colSums(wtmat) n <- length(x) # Table of sums for score and Schoenfeld resids hazmat <- wtmat %*% diag(hazard) #each subject's hazard over time dM <- -hazmat #Expected part for (i in 1:6) dM[i,i] <- dM[i,i] +1 #observed dM[7,6] <- dM[7,6] +1 # observed mart <- rowSums(dM) # Table of sums for score and Schoenfeld resids # Looks like the last table of appendix E.2.1 of the book resid <- dM * outer(x, xbar, '-') score <- rowSums(resid) scho <- colSums(resid) # We need to split the two tied times up, to match coxph scho <- c(scho[1:5], scho[6]/2, scho[6]/2) var.g <- cumsum(hazard*hazard /c(1,1,1,1,1,2)) var.d <- cumsum( (xbar-newx)*hazard) surv <- exp(-cumsum(hazard) * exp(beta*newx)) varhaz <- (var.g + var.d^2/imat)* exp(2*beta*newx) list(loglik=loglik, u=u, imat=imat, xbar=xbar, haz=hazard, mart=mart, score=score, rmat=resid, scho=scho, surv=surv, var=varhaz) } val2 <- true2(bhat) rtemp <- round(val2$mart, 6) @ $$ \begin{array}{ll} LL(0)= \Sexpr{round(true2(0)$loglik, 6)} & LL(\bhat)= \Sexpr{round(val2$loglik,6)} \\ U(0) = -2/15 & U(\bhat) = 0 \\ \imat(0) = 2821/1800 & \imat(\bhat) = \Sexpr{round(val2$imat,6)} %$ \end{array} $$ \def\haz{\hat \lambda} The martingale residuals are (status--cumulative hazard) or $O-E = \delta_i - \int Y_i(s) r_i d\lhat(s)$. Let $\haz_1, \ldots, \haz_6$ be the six increments to the cumulative hazard listed in Table \ref{tab:val2}. Then the cumulative hazards and martingale residuals for the subjects are as follows. \begin{center} \begin{tabular}{c|lrr} Subject &$\Lambda_i$ & $\Mhat(0)$ & $\Mhat(\bhat)$ \\ \hline 1& $r\haz_1$ & 1--30/60 & \Sexpr{rtemp[1]} \\ 2& $\haz_2 $ & 1--20/60 & \Sexpr{rtemp[2]}\\ 3& $\haz_3 $ & 1--12/60 & \Sexpr{rtemp[3]} \\ 4& $r(\haz_2 + \haz_3 + \haz_4)$& 1--47/60 & \Sexpr{rtemp[4]}\\ 5& $\haz_1+\haz_2+\haz_3+\haz_4+\haz_5$& 1--92/60 & \Sexpr{rtemp[5]}\\ 6 &$r*(\haz_5 + \haz_6) $& 1--39/60 & \Sexpr{rtemp[6]} \\ 7& $r*(\haz_3+\haz_4+\haz_5+ \haz_6)$& 1--66/60& \Sexpr{rtemp[7]}\\ 8& $r*(\haz_3+\haz_4+\haz_5+ \haz_6)$& 0--66/60&\Sexpr{rtemp[8]} \\ 9& $\haz_6$ & 0--24/60 & \Sexpr{rtemp[9]} \\ 10& $\haz_6$ & 0--24/60 & \Sexpr{rtemp[10]} \end{tabular} \end{center} The score and Schoenfeld residuals can be laid out in a tabular fashion. Each entry in the table is the value of $\{x_i - \xbar(t_j)\} d\Mhat_i(t_j)$ for subject $i$ and event time $t_j$. The row sums of the table are the score residuals for the subject; the column sums are the Schoenfeld residuals at each event time. Below is the table for $\beta= \log(2)$ ($r=2$). This is a slightly more stringent test than the table for $\beta=0$, since in this latter case a program could be missing a factor of $r = \exp(\beta)=1$ and give the correct answer. However, the results are much more compact than those for $\bhat$, since the solutions are exact fractions. %\newcommand{\pf}[2]{$\left(\frac{#1}{#2}\right)$} %positive fraction %\newcommand{\nf}[2]{$\left(\frac{-#1}{#2}\right)$} %negative fraction \newcommand{\pf}[2]{$\phantom{-}\frac{#1}{#2}$} %positive fraction \newcommand{\nf}[2]{$-\frac{#1}{#2}$} %negative fraction \renewcommand{\arraystretch}{1.5} \begin{center} \begin{tabular}{c|cccccc|c} & \multicolumn{6}{c|}{Event Time} & Score\\ Id&2&3&6&7&8&9& Resid \\ \hline 1&\pf{1}{9} &&&&&& $\phantom{-}\frac{1}{9}$ \\ 2&&\nf{3}{8} &&&&& $-\frac{3}{8}$ \\ 3&&&\nf{21}{32}&&&& $-\frac{21}{32}$ \\ 4&&\nf{1}{4} & \nf{1}{16} & \pf{5}{49} &&& $-\frac{165}{784} $\\ 5&\pf{2}{9}& \pf{1}{8} & \pf{3}{32} & \pf{6}{49} & \nf{36}{49}&& $-\frac{2417}{14112} $\\ 6&&&&& \nf{2}{49} & \pf{1}{8} & $\phantom{-}\frac{33}{392}$ \\ 7&&& \nf{1}{16} & \nf{2}{49}& \nf{2}{49} & \pf{1}{8} & $-\frac{15}{784}$ \\ 8&&& \nf{1}{16} & \nf{2}{49}& \nf{2}{49} & \nf{1}{8} & $-\frac{211}{784}$ \\ 9&&&&&& \pf{3}{16} &$ \phantom{-}\frac{3}{16}$ \\ 10&&&&&& \pf{3}{16} & $\phantom{-}\frac{3}{16}$ \\ \hline & \pf{1}{3} &\nf{1}{2} & \nf{3}{4} & \pf{1}{7} & \nf{6}{7} & \pf{1}{2} & \nf{95}{84} \\ &&&&&&& \\ & $\frac{1}{r+1}$ & $\frac{-r}{r+2}$ & $\frac{-3r}{r+2}$ & $\frac{1}{3r+1}$ & $\frac{3r}{3r+1}$ & $\frac{4}{3r+2}$ \end{tabular} \end{center} Both the Schoenfeld and score residuals sum to the score statistic $U(\beta)$. As discussed further above, programs will return two Schoenfeld residuals at time 7, one for each subject who had an event at that time. \subsection{Efron approximation} This example has only one tied death time, so only the term(s) for the event at time 9 change. The main quantities at that time point are as follows. \begin{center} \begin{tabular}{r|cc} &Breslow & Efron \\ \hline $LL$ & $2\log\left(\frac{r}{3r+2}\right)$ & $\log\left(\frac{r}{3r+2}\right) + \log\left(\frac{r}{2r+2}\right)$ \\ $U$ & $\frac{2}{3r+2}$& $\frac{1}{3r+2} + \frac{1}{2r+2}$ \\ $\imat$& $2\frac{6r}{(3r+2)^2} $ &$\frac{6r}{(3r+2)^2} + \frac{4r}{(2r+2)^2}$\\ $d\lhat$ & $\frac{2}{3r+2} $ & $\frac{1}{3r+2} + \frac{1}{2r+2}$ \end{tabular} \end{center} \renewcommand{\arraystretch}{1} \section{Test data 3} This is very similar to test data 1, but with the addition of case weights. There are 9 observations, $x$ is a 0/1/2 covariate, and weights range from 1 to 4. As before, let $r = \exp(\beta)$ be the risk score for a subject with $x=1$. Table \ref{tab:val3} shows the data set along with the mean and increment to the hazard at each point. \begin{table} \centering \begin{tabular}{cccc|cc} Time&Status&$X$ & Wt& $\xbar(t)$ & $d\lhat_0(t)$ \\ \hline 1& 1& 2 & 1& $(2r^2+11r) d\lhat_0 =\xbar_1$ & $1/(r^2 + 11r +7)$ \\ 1& 0& 0 & 2&& \\ 2& 1& 1 & 3& $11r/(11r+5) = \xbar_2$ & $10/(11r+5)$ \\ 2& 1& 1 & 4&& \\ 2& 1& 0 & 3&& \\ 2& 0& 1 & 2&& \\ 3& 0& 0 & 1&& \\ 4& 1& 1 & 2& $2r/(2r+1)= \xbar_3$ & $ 2/(2r+1)$ \\ 5& 0& 0 & 1 & \\ \end{tabular} \caption{Test data 3} \label{tab:val3} \end{table} \subsection{Breslow estimates} The likelihood is a product of terms, one for each death, of the form $$ \left( \frac{e^{X_i \beta}}{\sum_j Y_j(t_i) w_j e^{X_j \beta}} \right) ^{w_i} $$ For integer weights, this gives the same results as would be obtained by replicating each observation the specified number of times, which is in fact one motivation for the definition. The definitions for the score vector $U$ and information matrix $\imat$ simply replace the mean and variance with weighted versions of the same. Let $PL(\beta,w)$ be the log partial liklihood when all the observations are given a common case weight of $w$; it is easy to prove that $PL(\beta,w) = wPL(\beta,1) - d\log(w)$ where $d$ is the number of events. One consequence of this is that $PL$ can be positive for weights that are less than 1, a case which sometimes occurs in survey sampling applications. (This can be a big surprise the first time one encounters it.) \begin{eqnarray*} LL &=& \{2\beta - \log(r^2 + 11r +7)\} + 3\{\beta - \log(11r+5)\} \\ && + 4\{\beta - \log(11r+5)\} +3\{0 - \log(11r+5)\} \\ && + 2\{\beta - \log(2r+1) \} \\ &=& 11\beta - \log(r^2 + 11r +7) -10\log(11r+5) - 2\log(2r+1) \\\\ U &=& (2- \xbar_1) + 3(0-\xbar_2) + 4(1-\xbar_2) + 3(1-\xbar_2) + 2(1-\xbar_3) \\ &=& 11 - [(2r^2+11r)/(r^2+11r+7) + 10(11r/(11r+5)) + 2(2r/(2r+1))] \\ I &=& [(4r^2 + 11r)/(r^2 + 11r +7) - \xbar_1^2] + 10(\xbar_2 - \xbar_2^2) + 2(\xbar_3 - \xbar_3^2) \\ \end{eqnarray*} The solution corresponds to $U(\beta)=0$ and can be computed using a simple search for the zero of the equation. <>= ufun <- function(r) { xbar <- c( (2*r^2 + 11*r)/(r^2 + 11*r +7), 11*r/(11*r + 5), 2*r/(2*r +1)) 11- (xbar[1] + 10* xbar[2] + 2* xbar[3]) } rhat <- uniroot(ufun, c(1,3), tol= 1e-9)$root bhat <- log(rhat) c(rhat=rhat, bhat=bhat) @ From this we have <>= wfun <- function(r) { beta <- log(r) pl <- 11*beta - (log(r^2 + 11*r + 7) + 10*log(11*r +5) + 2*log(2*r +1)) xbar <- c((2*r^2 + 11*r)/(r^2 + 11*r +7), 11*r/(11*r +5), 2*r/(2*r +1)) U <- 11 - (xbar[1] + 10*xbar[2] + 2*xbar[3]) H <- ((4*r^2 + 11*r)/(r^2 + 11*r +7)- xbar[1]^2) + 10*(xbar[2] - xbar[2]^2) + 2*(xbar[3]- xbar[3]^2) c(loglik=pl, U=U, H=H) } temp <- matrix(c(wfun(1), wfun(rhat)), ncol=2, dimnames=list(c("loglik", "U", "H"), c("beta=0", "beta-hat"))) round(temp, 6) @ When $\beta=0$, the three unique values for $\xbar$ at $t=1$, 2, and 4 are 13/19, 11/16 and 2/3, respectively, and the increments to the cumulative hazard are 1/19, 10/16 = 5/8, and 2/3, see table \ref{tab:val3}. The martingale and score residuals at $\beta=0$ and $\bhat$ are \begin{center} \begin{tabular}{cc|lr} Id &Time& \multicolumn{1}{c}{$M(0)$} &\multicolumn{1}{c}{$M(\bhat)$} \\ \hline A&1 & $1-1/19 = 18/19 $&0.85531\\ B&1 & $0-1/19 = -1/19 $&-0.02593\\ C&2 & $1-(1/19 + 5/8)= 49/152 $&0.17636 \\ D&2 & $1-(1/19 + 5/8)= 49/152 $&0.17636\\ E&2 & $1-(1/19 + 5/8)= 49/152 $&0.65131\\ F&2 & $0-(1/19 + 5/8)= -103/152 $&-0.82364\\ G&3 & $0-(1/19 + 5/8)= -103/152 $&-0.34869\\ H&4 & $1-(1/19 + 5/8 +2/3)= -157/456 $&-0.64894\\ I&5 & $0-(1/19 + 5/8 +2/3)= -613/456 $&-0.69808\\ \end{tabular} \end{center} Score residuals at $\beta=0$ are \begin{center} \begin{tabular}{cc|r} Id &Time& Score \\ \hline A&1 &$(2-13/19)(1-1/19)$\\ B&1 &$(0-13/19)(0-1/19)$\\ C&2 &$ (1-13/19)(0-1/19) + (1-11/16)(1-5/8) $ \\ D&2 &$(1-13/19)(0-1/19) + (1-11/16)(1-5/8)$ \\ E&2 &$(0-13/19)(0-1/19) + (0-11/16)(1-5/8)$ \\ F&2 &$(1-13/19)(0-1/19) + (1-11/16)(0-5/8)$ \\ G&3 &$(1-13/19)(0-1/19) + (0-11/16)(0-5/8)$ \\ H&4 &$(1-13/19)(0-1/19) + (1-11/16)(0-5/8) $ \\ && $ + (1-2/3)(1-2/3)$ \\ I&5 &$(1-13/19)(0-1/19) + (1-11/16)(0-5/8)$ \\ & &$+ (0-2/3)(0-2/3) $ \\ \end{tabular} \end{center} {\splus} also returns unweighted residuals by default, with an option to return the weighted version; it is the weighted sum of residuals that totals zero, $\sum w_i \Mhat_i=0$. Whether the weighted or the unweighted form is more useful depends on the intended application, neither is more ``correct'' than the other. {\splus} does differ for the dfbeta residuals, for which the default is to return weighted values. For the third observation in this data set, for instance, the unweighted dfbeta is an approximation to the change in $\bhat$ that will occur if the case weight is changed from 2 to 3, corresponding to deletion of one of the three ``subjects'' that this observation represents, and the weighted form approximates a change in the case weight from 0 to 3, i.e., deletion of the entire observation. The increments of the Nelson-Aalen estimate of the hazard are shown in the rightmost column of table \ref{tab:val3}. The hazard estimate for a hypothetical subject with covariate $X^\dagger$ is $\Lambda_i(t) = \exp(X^\dagger \beta) \Lambda_0(t)$ and the survival estimate is $S_i(t)= \exp(-\Lambda_i(t))$. The two term of the variance, for $X^\dagger=0$, are Term1 + $d'Vd$: \begin{center} \begin{tabular}{c|l} Time & Term 1 \\ \hline 1& $1/(r^2 + 11r+7)^2$ \\ 2& $1/(r^2 + 11r+7)^2 + 10/(11r+5)^2$ \\ 4& $1/(r^2 + 11r+7)^2 + 10/(11r+5)^2 + 2/(2r+1)^2$ \\ \multicolumn{2}{c}{} \\ Time & $d$ \\ \hline 1& $(2r^2+11r)/(r^2+11r+7)^2$ \\ 2& $(2r^2+11r)/(r^2+11r+7)^2 + 110r/(11r+5)^2$ \\ 4& $(2r^2+11r)/(r^2+11r+7)^2 + 110r/(11r+5)^2 + 4r/(2r+1)^2$ \end{tabular} \end{center} For $\beta=\log(2)$ and $X^\dagger =0$, where $k\equiv$ the variance of $\bhat$ = 1/2.153895 this reduces to \begin{center} \begin{tabular}{c|ll} Time & \multicolumn{2}{c}{Variance}\\ \hline 1 & 1/1089 &+ $k(30/1089)^2$ \\ 2 & (1/1089+ 10/729) &+ $k(30/1089+ 220/729)^2 $ \\ 4 & (1/1089+ 10/729 + 2/25)&+ $k(30/1089+ 220/729 + 8/25)^2$\\ \end{tabular} \end{center} giving numeric values of 0.0012706, 0.0649885, and 0.2903805, respectively. \subsection{Efron approximation} For the Efron approximation the combination of tied times and case weights can be approached in at least two ways. One is to treat the case weights as replication counts. There are then 10 tied deaths at time 2 in the data above, and the Efron approximation involves 10 different denominator terms. Let $a= 7r+3$, the sum of risk scores for the 3 observations with an event at time 2 and $b=4r+2$, the sum of risk scores for the other subjects at risk at time 2. For the replication approach, the loglikelihood is \begin{eqnarray*} LL &=& \{2\beta - \log(r^2 + 11r +7)\} + \\ && \{7\beta - \log(a+b) - \log(.9a+b) - \ldots - \log(.1a+b) \} + \\ && \{2\beta - \log(2r+1) - \log(r+1)\}. \end{eqnarray*} A test program can be created by comparing results from the weighted data set (9 observations) to the unweighted replicated data set (19 observations). This is the approach taken by SAS \code{phreg} using the \code{freq} statement. It's advantage is that the appropriate result for all of the weighted computations is perfectly clear the disadvantage is that the only integer case weights are supported. (A secondary advantage is that I did not need to create another algebraic derivation for this appendix.) A second approach, used in {\splus}, allows for non-integer weights. The data is considered to be 3 tied observations, and the log-likelihood at time 2 is the sum of 3 weighted terms. The first term of the three is one of \begin{eqnarray*} && 3 [\beta - \log(a+b)] \\ && 4 [\beta - \log(a+b)] \\ &{\rm or}&3 [0 - \log(a+b)], \end{eqnarray*} depending on whether the event for observation C, D or E actually happened first (had we observed the time scale more exactly); the leading multiplier of 3, 4 or 3 is the case weight. The second term is one of \begin{eqnarray*} && 4 [\beta - \log(4s+3+b)] \\ && 3 [0 - \log(4s+3+b)] \\ && 3 [\beta - \log(3s+3+b)] \\ && 3 [\beta - \log(3s+3+b)] \\ && 3 [0 - \log(4s+3+b)] \\ &{\rm or}&4 [\beta - \log(4s+3+b)]. \end{eqnarray*} The first choice corresponds to an event order of observation C then D (subject D has the event, with D and E still at risk), the second to $C \rightarrow E$, then $D\rightarrow C$, $D\rightarrow E$, $E \rightarrow C$ and $E \rightarrow D$, respectively. For a weighted Efron approximation first replace the argument to the $\log$ function by its average argument, just as in the unweighted case. Once this is done the average term in the above corresponds to using an average weight of 10/3. The final log-likelihood and score statistic are \begin{eqnarray*} LL &=& \{2\beta - \log(r^2 + 11r +7)\} \\ && + \{7\beta - (10/3)[\log(a+b) + \log(2a/3 +b) + \log(a/3+b)] \} \\&& + 2\{\beta - \log(2r+1) \} \\ \\ U &=& (2- \xbar_1) + 2(1-\xbar_3) \\ && + 7 - (10/3)[\xbar_2 + 26r/(26r+12) + 19r/(19r+9)] \\ &=& 11 -(\xbar_1 + (10/3)(\xbar_2 + \xbar_{2b} +\xbar_{2c}) + 2\xbar_3)\\ \\ I &=& [(4s^2+11s)/(s^2+11s+7)- \xbar_1^2] \\ &&+ (10/3)[ (\xbar_2- \xbar_2^2) + (\xbar_{2b}- \xbar_{2b}^2) + (\xbar_{2b}- \xbar_{2b}^2) \\ &&+2(\xbar_3 - \xbar_3^2) \\ \end{eqnarray*} The solution is at $\beta=.87260425$, and $$ \begin{array}{ll} LL(0)=-30.29218 & LL(\bhat)=-29.41678 \\ U(0) = 2.148183 & U(\bhat) = 0 \\ \imat(0) = 2.929182 & \imat(\bhat) = 1.969447 \,. \end{array} $$ The hazard increment and mean at times 1 and 4 are identical to those for the Breslow approximation, as shown in table \ref{tab:val3}. At time 2, the number at risk for the first, second and third portions of the hazard increment are $n_1= 11r+5$, $n_2= (2/3)(7r+3) + 4r+2 = (26r+12)/3$, and $n_3=(1/3)(7r+3) + 4r+2 = (19r+9)/3$. Subjects F--I experience the full hazard at time 2 of $(10/3)(1/n_1 + 1/n_2 + 1/n_3)$, subjects B--D experience $(10/3)(1/n_1 + 2/3n_2 + 1/3n_3)$. Thus, at $\beta=0$ the martingale residuals are \begin{center} \begin{tabular}{cc|ll} Id& Time & \multicolumn{1}{c}{$\Mhat(0)$} \\ \hline A & 1 & 1 - 1/19 & = 18/19 \\ B & 1 & 0 - 1/19 & = -1/19 \\ C & 2 & 1 - (1/19 + 10/48 + 20/114 + 10/84)& =473/1064 \\ D & 2 & 1 - (1/19 + 10/48 + 20/114 + 10/84)& =473/1064 \\ E & 2 & 1 - (1/19 + 10/48 + 20/114 + 10/84) &=473/1064 \\ F & 2 & 0 - (1/19 + 10/48 + 10/38\phantom{4} + 10/28)& =-2813/3192 \\ G & 3 & 0 - (1/19 + 10/48 + 10/38\phantom{4} + 10/28) &=-2813/3192 \\ H & 4 & 1 - (1/19 + 10/48 + 10/38\phantom{4} + 10/28 + 2/3) &=-1749/3192 \\ I & 5 & 0 - (1/19 + 10/48 + 10/38\phantom{4} + 10/28 + 2/3) &=-4941/3192 \end{tabular} \end{center} The hazard estimate for a hypothetical subject with covariate $X^\dagger$ is $\Lambda_i(t) = \exp(X^\dagger \beta) \Lambda_0(t)$, $\Lambda_0$ has increments of $1/(r^2 + 11r +7$, $(10/3)(1/n_1 + 1/n_2 + 1/n_3)$ and $2/(2r+1)$. This increment at time 2 is a little larger than the Breslow jump of $10/d1$. The first term of the variance will have an increment of $[\exp((X^\dagger \beta)(]^2 (10/3)(1/n_1^2 + 1/n_2^2 + 1/n_3^2)$ at time 2. The increment to the cumulative distance from the center $d$ will be \begin{eqnarray*} && [X^\dagger - \frac{11r}{11r+5}] \frac{10}{3 n_1} \\ &+& [X^\dagger -\frac{(2/3)7r + 4r}{n2} ] (10/3)(1/n_2) \\ &+& [X^\dagger -\frac{(1/3)7r + 4r}{n2} ] (10/3)(1/n_3) \end{eqnarray*} For $X^\dagger = 1$ and $\beta=\pi/3$ we get cumulative hazard and variance below. We have $r\equiv e^\pi/3$, $V$= %\begin{tabular}{l$=$l|l$+$l$=$r} %\multicolumn{2}{c}{$\Lambda$} & \multicolumn{3}{c}{Variance} \\ \hline %e/(r^2+ 11r+7) & 0.03272832 & e^2/(r^2+ 11r+7)^2 & \end{document}