C.1 Tumorgenicity Data Analysis of Chapter 3 C.1.1 Poisson Analysis with Weibull Baseline Rate The data for the first five rats in the treated group are displayed below in the data frame 'rats', in the 'counting process' format. > rats <- read.table("rats.dat", header=F) > dimnames(rats)[[2]] <- c("id","start","stop","status","enum","trt") > rats$rtrunc <- rep(NA, nrow(rats)) > rats$tstatus <- ifelse(rats$start == 0, 1, 2) > rats[1:12, c("id","start","stop","status","rtrunc","tstatus","enum","trt")] id start stop status rtrunc tstatus enum trt 1 1 0 122 1 NA 1 1 1 2 2 0 122 0 NA 1 1 1 3 3 0 3 1 NA 1 1 1 4 3 3 88 1 NA 2 2 1 5 3 88 122 0 NA 2 3 1 6 4 0 92 1 NA 1 1 1 7 4 92 122 0 NA 2 2 1 8 5 0 70 1 NA 1 1 1 9 5 70 74 1 NA 2 2 1 10 5 74 85 1 NA 2 3 1 11 5 85 92 1 NA 2 4 1 12 5 92 122 0 NA 2 5 1 Fit 'Weibull' model (3.56) to the control rats. Note that the 'truncation' option enables one to indicate that the left truncation times are contained in the 'start' variable. > wfitC <- censorReg(censor(stop, status) ~ 1, data=rats, subset=(trt==0), truncation=censor(start,rtrunc,tstatus), distribution="weibull") > summary(wfitC) Call: censorReg(formula = censor(stop, status) ~ 1, data = rats, truncation = censor(start, rtrunc, tstatus), subset = (trt == 0), distribution = "weibull") Distribution: Weibull Standardized Residuals: Min Max Uncensored 0.1046 6.0405 Censored 6.0400 6.0400 Coefficients: Est. Std.Err. 95% LCL 95% UCL z-value p-value (Intercept) 3.161 0.153 2.861 3.461 20.66 7.551e-95 Extreme value distribution: Dispersion (scale) = 0.9135831 Observations: 173 Total; 22 Censored -2*Log-Likelihood: 1209 The regression model rhoi(t) = rho0(t) * exp(xi * beta) is fit by omitting the 'subset' option and specifying the 'trt' covariate in the model. > wfit <- censorReg(censor(stop, status) ~ trt, data=rats, truncation=censor(start,rtrunc,tstatus), distribution="weibull") > summary(wfit) Call: censorReg(formula = censor(stop, status) ~ trt, data = rats, truncation = censor(start, rtrunc, tstatus), distribution = "weibull") Distribution: Weibull Standardized Residuals: Min Max Uncensored 0.0519 6.0405 Censored 2.6522 6.0400 Coefficients: Est. Std.Err. 95% LCL 95% UCL z-value p-value (Intercept) 3.1097 0.1394 2.8365 3.383 22.314 2.673e-110 trt 0.7754 0.1525 0.4764 1.074 5.084 3.704e-07 Extreme value distribution: Dispersion (scale) = 0.94214648 Observations: 254 Total; 42 Censored -2*Log-Likelihood: 1798 Note that because of the way the model is parameterized in censorReg, the parameters alpha1, alpha2, beta in Section 3.8.1 are related to those here by > alpha1 <- as.vector( exp(-wfit$coef[1]) ) [1] 0.0446 > alpha2 <- as.vector( 1/wfit$scale ) [1] 1.0614 > beta <- as.vector( (-1)*wfit$coef[2]/wfit$scale ) [1] -0.8230 C.1.2 Poisson Analysis with Piecewise Constant Rate In Section 3.8.1, a piecewise constant rate function (Section 3.3) with cutpoints at 30, 60, 90 days is considered, resulting in a four parameters in the rate funciton. Th entries in the data frame 'rats.pw' are given below for the first five rats in the treated group. gd.pw.f <- function(indata) { pid <- sort(unique(indata$id)) data <- matrix(0, nrow=(length(pid)*4), ncol=5) for (i in 1:length(pid)) { tmp <- indata[indata$id == pid[i], ] etime <- floor( tmp$stop[tmp$status == 1] ) startpos <- 4*(i-1) + 1 stoppos <- 4*i data[startpos:stoppos,1] <- rep(pid[i],4) data[startpos:stoppos,2] <- c(1,2,3,4) data[startpos:stoppos,3] <- c(sum((etime > 0) & (etime <= 30)), sum((etime > 30) & (etime <= 60)), sum((etime > 60) & (etime <= 90)), sum((etime > 90) & (etime <= 122))) data[startpos:stoppos,4] <- c(30,30,30,32) data[startpos:stoppos,5] <- rep(unique(tmp$trt),4) } data <- data.frame(data) dimnames(data)[[2]] <- c("id","interval","count","len","trt") return(data) } > rats.pw <- gd.pw.f(rats) > rats.pw[1:20,] id interval count len trt 1 1 1 0 30 1 2 1 2 0 30 1 3 1 3 0 30 1 4 1 4 1 32 1 5 2 1 0 30 1 6 2 2 0 30 1 7 2 3 0 30 1 8 2 4 0 32 1 9 3 1 1 30 1 10 3 2 0 30 1 11 3 3 1 30 1 12 3 4 0 32 1 13 4 1 0 30 1 14 4 2 0 30 1 15 4 3 0 30 1 16 4 4 1 32 1 17 5 1 0 30 1 18 5 2 0 30 1 19 5 3 3 30 1 20 5 4 1 32 1 Fit the piecewise constant model for control rats. > options(contrasts = c("contr.treatment", "contr.poly")) > pfitC <- glm(count ~ offset(log(len)) + factor(interval), family=poisson(link=log), data=rats.pw, subset=(trt==0), epsilon = 1e-004) > summary(pfitC, corr=F) Call: glm(formula = count ~ offset(log(len)) + factor(interval), family = poisson(link = log), data = rats.pw, subset = (trt == 0), epsilon = 0.0001) Deviance Residuals: Min 1Q Median 3Q Max -1.918 -1.575 -0.2736 0.6262 2.896 Coefficients: Value Std. Error t value (Intercept) -3.0937 0.1714 -18.0502 factor(interval)2 0.1625 0.2330 0.6977 factor(interval)3 0.3023 0.2260 1.3377 factor(interval)4 -0.1569 0.2481 -0.6325 (Dispersion Parameter for Poisson family taken to be 1 ) Null Deviance: 167.8 on 99 degrees of freedom Residual Deviance: 163.3 on 96 degrees of freedom Number of Fisher Scoring Iterations: 4 An estimate of the treatment effect based on a multiplicative model is obtained by introducing the 'trt' covariate into the regression model. > pfit <- glm(count ~ offset(log(len)) + factor(interval) + trt, family=poisson(link=log), data=rats.pw, epsilon = 1e-004) > summary(pfit, corr=F) Call: glm(formula = count ~ offset(log(len)) + factor(interval) + trt, family = poisson(link = log), data = rats.pw, epsilon = 0.0001) Deviance Residuals: Min 1Q Median 3Q Max -1.834 -1.199 -0.3302 0.4701 3.055 Coefficients: Value Std. Error t value (Intercept) -3.08818 0.1506 -20.5007 factor(interval)2 0.17185 0.1956 0.8785 factor(interval)3 0.20634 0.1941 1.0631 factor(interval)4 -0.06454 0.2039 -0.3166 trt -0.82302 0.1514 -5.4354 (Dispersion Parameter for Poisson family taken to be 1 ) Null Deviance: 301.4 on 191 degrees of freedom Residual Deviance: 266.3 on 187 degrees of freedom Number of Fisher Scoring Iterations: 4 C.1.3 Nonparametric and Semiparametric Poisson Analysis The nonparametric Nelson-Aalen estimate of the mean function for control individuals given in Figures 3.2 and 3.3, is obtained by fitting a null Cox model using the 'rats' data frame. > NPfit <- coxph(Surv(start,stop,status) ~ 1, data=rats, subset=(trt==0), method="breslow") > KM <- survfit(NPfit, type="aalen") > NA.MF <- data.frame(time=c(0,KM$time), na=-log(c(1,KM$surv))) > plot(NA.MF$time, NA.MF$na, type="s", xlab="TIME (DAYS)", ylab="EXPECTED NUMBER OF TUMORS", main="") The semiparametric multiplicative Poisson model with a treatment covariate can be fit with the 'coxph' function by including all subjects and adding the covariate 'trt'. > fit <- coxph(Surv(start,stop,status) ~ trt, data=rats, method="breslow") > summary(fit) Call: coxph(formula = Surv(start, stop, status) ~ trt, data = rats, method = "breslow") n= 254 coef exp(coef) se(coef) z p trt -0.816 0.442 0.152 -5.37 7.8e-08 exp(coef) exp(-coef) lower .95 upper .95 trt 0.442 2.26 0.328 0.596 Rsquare= 0.117 (max possible= 0.998 ) Likelihood ratio test= 31.7 on 1 df, p=1.81e-08 Wald test = 28.9 on 1 df, p=7.76e-08 Score (logrank) test = 30.5 on 1 df, p=3.27e-08 > cox.zph(fit, transform="identity") rho chisq p trt -0.0217 0.0995 0.752 > cox.zph(cfit, transform="log") rho chisq p trt -0.0505 0.541 0.462 C.1.4 Semiparametric Mixed Poisson Analysis The coxph function can be used to fit semiparametric mixed Poisson models with the frailty(id) option. > fitf <- coxph(Surv(start,stop,status) ~ trt + frailty(id), data=rats, method="breslow") > summary(fitf) n= 254 coef se(coef) se2 Chisq DF p trt -0.816 0.211 0.153 15.0 1.0 0.00011 frailty(id) 49.5 24.3 0.00190 exp(coef) exp(-coef) lower .95 upper .95 trt 0.442 2.26 0.292 0.668 Iterations: 6 outer, 18 Newton-Raphson Variance of random effect= 0.27 I-likelihood = -791.9 Degrees of freedom for terms= 0.5 24.3 Rsquare= 0.347 (max possible= 0.998 ) Likelihood ratio test= 108 on 24.87 df, p=2.38e-12 Wald test = 15 on 24.87 df, p=0.939 C.1.5 Robust Semiparametric Analysis Robust variance estimates and standard errors as given in (3.38) are obtained by using the cluster(id) option in coxph. > fitr <- coxph(Surv(start,stop,status) ~ trt + cluster(id), data=rats, method="breslow") Call: coxph(formula = Surv(start, stop, status) ~ trt + cluster(id), data = rats, method = "breslow") n= 254 coef exp(coef) se(coef) robust se z p trt -0.815774 0.442297 0.151836 0.19809 -4.11819 3.8186e-05 exp(coef) exp(-coef) lower .95 upper .95 trt 0.442297 2.26092 0.299985 0.652122 Rsquare= 0.117 (max possible= 0.998 ) Likelihood ratio test= 31.69 on 1 df, p=1.81146e-08 Wald test = 16.96 on 1 df, p=3.8186e-05 Score (logrank) test = 30.54 on 1 df, p=3.26554e-08, Robust = 11.2 p=0.000816617 (Note: the likelihood ratio and score tests assume independence of observations within a cluster, the Wald and robust score tests do not).