Abstract
This paper studies the Cox model with time-varying coefficients for cause-specific hazard functions when the causes of failure are subject to missingness. Inverse probability weighted and augmented inverse probability weighted estimators are investigated. The latter is considered as a two-stage estimator by directly utilizing the inverse probability weighted estimator and through modeling available auxiliary variables to improve efficiency. The asymptotic properties of the two estimators are investigated. Hypothesis testing procedures are developed to test the null hypotheses that the covariate effects are zero and that the covariate effects are constant. We conduct simulation studies to examine the finite sample properties of the proposed estimation and hypothesis testing procedures under various settings of the auxiliary variables and the percentages of the failure causes that are missing. These simulation results demonstrate that the augmented inverse probability weighted estimators are more efficient than the inverse probability weighted estimators and that the proposed testing procedures have the expected satisfactory results in sizes and powers. The proposed methods are illustrated using the Mashi clinical trial data for investigating the effect of randomization to formula-feeding versus breastfeeding plus extended infant zidovudine prophylaxis on death due to mother-to-child HIV transmission in Botswana.
Similar content being viewed by others
References
Aalen OO, Johansen S (1978) An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scand J Stat 5:141–150
Cai Z, Sun Y (2003) Local linear estimation for time-dependent coefficients in Cox’s regression models. Scand. J Stat 30:93–111
Clemens JD, Sack DA, Harris JR et al (1990) Field trial of oral cholera vaccines in Bangladesh: results from three-year follow-up. Lancet 335:270–273
Efromovich S (2010) Dimension reduction and adaptation in conditional density estimation. J Am Stat Assoc 105:761–774
Fan J, Gijbels I (1996) Local polynomial modelling and its applications: monographs on statistics and applied probability 66, 1st edn. Chapman and Hall/CRC, New York
Gao G, Tsiatis AA (2005) Semiparametric estimators for the regression coefficients in the linear transformation competing risks model with missing cause of failure. Biometrika 92:875–891
Gilbert P, McKeague I, Sun Y (2008) The 2-sample problem for failure rates depending on a continuous mark: an application to vaccine efficacy. Biostatistics 9(2):263–276
Gilbert P, Sun Y (2015) Inferences on relative failure rates in stratified mark-specific proportional hazards models with missing marks, with application to human immunodeficiency virus vaccine efficacy trials. J R Stat Soc Ser C (Appl Stat) 64(1):49–73
Goetghebeur E, Ryan L (1995) Analysis of competing risks survival data when some failure types are missing. Biometrika 82(4):821–833
Hall P, Racine JS, Li Q (2004) Cross-validation and the estimation of conditional probability densities. J Am Stat Assoc 99:1015–1026
Horvitz D, Thompson D (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685
Hyun S, Lee J, Sun Y (2012) Proportional hazards model for competing risks data with missing cause of failure. J Stat Plann Inference 142:1767–1779
Izbicki R, Lee AB (2016) Nonparametric conditional density estimation in a high-dimensional regression setting. J Comput Gr Stat 25(4):1297–1316
Lin DY, Wei LJ, Ying Z (1993) Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika 80:557–572
Liu L, Nevo D, Nishihara R, Cao Y, Song M, Twombly T, Chan A, Giovannucci E, VanderWeele T, Wang M, Ogino S (2018) Utility of inverse probability weighting in molecular pathological epidemiology. Eur J Epidemiol 33(4):381–392
Lu W, Liang Y (2008) Analysis of competing risks data with missing cause of failure under additive hazards model. Stat Sin 19:219–234
Lu K, Tsiatis A (2001) Multiple imputation methods for estimating regression coefficients in the competing risks model with missing cause of failure. Biometrics 57(4):1191–1197
Lu K, Tsiatis A (2005) Comparison between two partial likelihood approaches for the competing risks model with missing cause of failure. Lifetime Data Anal 11:29–40
Martinussen T, Scheike TH, Skovgaard IM (2002) Efficient estimation of fixed and time-varying covariates effects in multiplicative intensity models. Scand J Stat 29:59–77
Martinussen T, Scheike T (2006) Dynamic regression models for survival data. Springer, New York
Murphy SA, Sen PK (1991) Time-dependent coefficients in a Cox-type regression model. Stoch Process Appl 39:153–180
Nevo D, Nishihara R, Ogino S, Wang M (2018) The competing risks Cox model with auxiliary case covariates under weaker missing-at-random cause of failure. Lifetime Data Anal 24:425–442
Rice JA, Silverman BW (1991) Estimating the mean and covariance structure nonparametrically when the data are curves. J R Stat Soc Ser B 53:233–243
Robins J, Rotnitzky A, Zhao L (1994) Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 89:846–866
Rubin D (1976) Inference and missing data. Biometrika 63:581–592
Scharfstein DO, Rotnitzky A, Robins JM (1999) Adjusting for nonignorable drop-out using semiparametric nonresponse models: rejoinder. J Am Stat Assoc 94:1135–1146
Sun Y, Gilbert PB (2012) Estimation of stratified mark-specific proportional hazards models with missing marks. Scand J Stat 39:34–52
Sun Y, Hyun S, Gilbert PB (2008) Testing and estimation of time-varying cause-specific hazard ratios with covariate adjustment. Biometrics 64:1070–1079
Sun Y, Qian X, Shou Q, Gilbert P (2017) Analysis of two-phase sampling data with semiparametric additive hazards models. Lifetime Data Anal 23:377–399
Sun Y, Sundaram R, Zhao Y (2009) Empirical likelihood inference for the Cox model with time-dependent coefficients via local partial likelihood. Scand J Stat 36:444–462
Sun Y, Wang H, Gilbert PB (2012) Quantile regression for competing risks data with missing cause of failure. Stat Sin 22:703–728
Sun Y, Wu H (2005) Semiparametric time-varying coefficients regression model for longitudinal data. Scand J Stat 32:21–47
Thior, I., Lockman, S., Smeaton, L.M., Shapiro, R.L., Wester, C., Heymann, S.J., Gilbert, P.B., Stevens, L., Peter, T., Kim, S., van Widenfelt, E., Moffat, C., Ndase, P., Arimi, P., Kebaabetswe, P., Mazonde, P., Makhema, J., McIntosh, K., Novitsky, V., Lee, T.H., Marlink, R., Lagakos, S., Essex M. and the Mashi Study Team (2006) Breastfeeding plus infant zidovudine prophylaxis for 6 months vsformula feeding plus infant zidovudine for 1 month to reducemother-to-child HIV transmission in Botswana: a randomized trial:the Mashi study. J. Am. Stat Medical Assoc 296: 794–805
Tian L, Zucker D, Wei LJ (2005) On the Cox model with time-varying regression coefficients. J Am Stat Assoc 100:172–183
van der Vaart AW (1998) Asymptotic statistics. Cambridge University Press, Cambridge
Zucker DM, Karr AF (1990) Nonparametric survival analysis with time-dependent covariate effects: a penalized partial likelihood approach. Ann Stat 18:329–353
Acknowledgements
We thank the reviewers for their constructive comments that have improved the contents and the presentation of the paper. The authors also thank the Mashi study team (led by Dr. Max Essex) and the Mashi study participants. Research reported in this publication was partially supported by the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under Award Number R37 AI054165. The research of Yanqing Sun was also partially supported by National Science Foundation grant DMS-1513072 and DMS-1915829. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
Appendix
This Appendix introduces the notations and presents the conditions for the asymptotic results presented in Theorems 1–5.
Let \(\mathcal {F}_t\) be the right continuous filtration generated by the data processes \(\{N_{ik}(s), Y_i(s), Z_i(s); i=1,\dots ,n, k=1,2,\dots ,L, 0 \le s \le t\}\). Assume \(E(dN_{ik}(t)=1|\mathcal {F}_{t^-})=E(dN_{ik}(t)=1|Y_i(t), Z_i(t))=Y_i(t)\lambda _{ik}(t|Z_i(t))dt\). It follows that \(M_{ik}(t) = N_{ik}(t) -\int _0^t Y_i(u)\lambda _k(u|Z_i(u))du\), \(i=1,\dots ,n, k=1,2,\dots ,L\), are multivariate orthogonal martingales with respect to \(\mathcal {F}_t\) (Aalen and Johansen 1978). To accommodate additional information introduced due to missing data, we define the augmented filtration \(\mathcal {F}_t^*\) generated by the data processes \(\{N_{ik}(s), Y_i(s), Z_i(s), R_i, \delta _i A_i; i=1,\dots ,n, k=1,2,\dots ,L, 0 \le s \le t\}\). Let \(\lambda _{ik}^*(t)\,dt=P\{T_i\in [t,t+dt),V_i=k|X_i \ge t, Z_i(t), R_i, \delta _i A_i\}\). Then \(Y_i(t)\lambda _{ik}^*(t)\) is the intensity of \(N_{ik}(t)\) with respect to \(\mathcal {F}_t^*\), and \(M_{ik}^*(t) = N_{ik}(t) -\int _0^t Y_i(u)\lambda _{ik}^*(u)du\), \(i=1,\dots ,n, k=1,2,\dots ,L\), are multivariate orthgonal martingales with respect to \(\mathcal {F}_t^*\).
Let \(S^{(j)}(t,\beta _k)=n^{-1}\sum _{i=1}^{n} Y_i(t)\exp \left( \beta _k(t)^{{\textsf {T}}}Z_i(t)\right) Z_i(t)^{\otimes j}\), and \(S_I^{*(j)}(t,\beta _k,\psi )=n^{-1}\sum _{i=1}^n q_i \)\(Y_i(t)\exp \left( \beta _k(t)^{{\textsf {T}}}Z_i(t)\right) Z_i(t)^{\otimes j}\), for \(k=1,\dots ,L\) and \(j=0,1,2\). Let \(s^{(j)}(t,\beta _k)=ES^{(j)}(t,\beta _k)\) and \(s_I^{*(j)}(t,\beta _k,\psi )=ES_I^{*(j)}(t,\beta _k,\psi )\). If the model \(r(\zeta _i, A_i, \psi )\) is correctly specified, then \(s^{(j)}(t,\beta _k)\)\(= s_I^{*(j)}(t,\beta _k,\psi )\). Define \(\Sigma _k(t) = \big [s^{(2)}(t,\beta _k) - {\big (s^{(1)}(t,\beta _k)\big )^{\otimes 2}}/\)\({s^{(0)}(t,\beta _k)}\big ] \lambda _{k0}(t)\) and \(\Sigma _k^*(t)=E\big [ \big (Z_i(t)-\)\({s^{(1)}(t,\beta _k)}/{s^{(0)}(t,\beta _k)} \big )^{\otimes 2}\)\( {R_i}{\pi ^{-2}(Q_i)} Y_{i}(t) \lambda _{ik}^*(t) \big ]\).
Let \(S_i^{\psi }\) and \(I^{\psi }\) be the score vector and information matrix for \({{\widehat{\psi }}}\) under (4). Then,
and \({{\widehat{\psi }}} - \psi = n^{-1}\sum _{i=1}^n(I^{\psi })^{-1}S_i^{\psi }+o_p(n^{-1/2})\), where \(\psi _0\) is the true value of \(\psi \). We also define the following notations:
The following conditions are assumptions we use to prove the theorems:
-
(C.1)
For \(k=1,\dots ,L\), \(\beta _k(t)\) has componentwise second derivatives on \([0,\tau ]\). The sample path of the covariate process \(Z_i(t)\) is left continuous and of bounded variation, and satisfies the moment condition \(E[||Z_i(t)||^4\exp (2M||Z_i(t)||)]<\infty \), where M is a constant such that \((t,\beta _k(t))\in [0,\tau ]\times [-M,M]^p\) for all t and \(||A||=\max _{k,l}|a_{kl}|\) for a matrix \(A=(a_{kl})\).
-
(C.2)
The kernel function \(K(\cdot )\) is bounded and symmetric with bounded support \([-1,1]\). The bandwidth h satisfies \(nh^2\rightarrow \infty \) and \(nh^5\) is bounded as \(n \rightarrow \infty \).
-
(C.3)
The matrix \(\Sigma _k(t)\) is positive definite for all \(t \in [0, \tau ]\).
-
(C.4)
For \(k=1,\dots ,L\) and for \(j=0,1,2\), the functions \(s^{(j)}(t,\beta _k)\) and \(s_I^{*(j)}(t,\beta _k,\psi )\) are componentwise continuous on \(t\in [0,\tau ],\beta _k\in [-M,M]^p,\psi \in \varTheta _{\psi }\), where \(\varTheta _{\psi }\) is a compact set. \(\sup _{t\in [0,\tau ],\beta _k\in [-M,M]^p}||S^{(j)}(t,\beta _k)-s^{(j)}(t,\beta _k)||=O_p(n^{-1/2})\), and \(\sup _{t\in [0,\tau ],\beta _k\in [-M,M]^p,\psi \in \varTheta _{\psi }}\)\(||S_I^{*(j)}(t,\beta _k,\psi )-s_I^{*(j)}(t,\beta _k,\psi )||=O_p(n^{-1/2}).\)
-
(C.5)
The function \(r(\zeta _i,A_i,\psi )\) is twice differentiable with respect to \(\psi \) on a compact set \(\varTheta _{\psi }\), \(r^\prime (\zeta _i,A_i,\psi )=\partial r(\zeta _i,A_i,\psi )/\partial \psi \) is uniformly bounded, and there is an \(\varepsilon > 0\) such that \(r(\zeta _i,A_i,\psi ) \ge \varepsilon \) for all i. The function \(f(A_i |k,T_i,Z_i,\varphi _k)\) is also twice differentiable with respect to \(\varphi _k\) on a compact set \(\varTheta _{\varphi _k}\) for \(k=1,\dots ,L\).
Supplementary materials
The Web-based Supplementary Materials referenced in the manuscript are available with this paper at the journal’s online website.
Rights and permissions
About this article
Cite this article
Heng, F., Sun, Y., Hyun, S. et al. Analysis of the time-varying Cox model for the cause-specific hazard functions with missing causes. Lifetime Data Anal 26, 731–760 (2020). https://doi.org/10.1007/s10985-020-09497-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-020-09497-y
Keywords
- Augmented inverse probability weighted estimator
- Auxiliary variables
- Cause-specific hazard function
- Competing risks model
- Hypothesis testing procedures
- Missing causes of failure
- Inverse probability weighted estimator
- Cox model with time-dependent coefficients
- Two-stage augmented inverse probability weighted estimator