Analysis of the time-varying Cox model for the cause-specific hazard functions with missing causes

Heng, Fei; Sun, Yanqing; Hyun, Seunggeun; Gilbert, Peter B.

doi:10.1007/s10985-020-09497-y

Analysis of the time-varying Cox model for the cause-specific hazard functions with missing causes

Published: 09 April 2020

Volume 26, pages 731–760, (2020)
Cite this article

Lifetime Data Analysis Aims and scope Submit manuscript

Fei Heng¹,
Yanqing Sun ORCID: orcid.org/0000-0002-3140-4572²,
Seunggeun Hyun³ &
…
Peter B. Gilbert⁴

562 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

This paper studies the Cox model with time-varying coefficients for cause-specific hazard functions when the causes of failure are subject to missingness. Inverse probability weighted and augmented inverse probability weighted estimators are investigated. The latter is considered as a two-stage estimator by directly utilizing the inverse probability weighted estimator and through modeling available auxiliary variables to improve efficiency. The asymptotic properties of the two estimators are investigated. Hypothesis testing procedures are developed to test the null hypotheses that the covariate effects are zero and that the covariate effects are constant. We conduct simulation studies to examine the finite sample properties of the proposed estimation and hypothesis testing procedures under various settings of the auxiliary variables and the percentages of the failure causes that are missing. These simulation results demonstrate that the augmented inverse probability weighted estimators are more efficient than the inverse probability weighted estimators and that the proposed testing procedures have the expected satisfactory results in sizes and powers. The proposed methods are illustrated using the Mashi clinical trial data for investigating the effect of randomization to formula-feeding versus breastfeeding plus extended infant zidovudine prophylaxis on death due to mother-to-child HIV transmission in Botswana.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On a Shape-Invariant Hazard Regression Model with application to an HIV Prevention Study of Mother-to-Child Transmission

Article 19 October 2019

Bayesian Approach for Joint Modeling Longitudinal Data and Survival Data Simultaneously in Public Health Studies

Bayesian Sensitivity Analysis for Non-ignorable Missing Data in Longitudinal Studies

Article 21 March 2019

References

Aalen OO, Johansen S (1978) An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scand J Stat 5:141–150
MathSciNet MATH Google Scholar
Cai Z, Sun Y (2003) Local linear estimation for time-dependent coefficients in Cox’s regression models. Scand. J Stat 30:93–111
Article MathSciNet Google Scholar
Clemens JD, Sack DA, Harris JR et al (1990) Field trial of oral cholera vaccines in Bangladesh: results from three-year follow-up. Lancet 335:270–273
Article Google Scholar
Efromovich S (2010) Dimension reduction and adaptation in conditional density estimation. J Am Stat Assoc 105:761–774
Article MathSciNet Google Scholar
Fan J, Gijbels I (1996) Local polynomial modelling and its applications: monographs on statistics and applied probability 66, 1st edn. Chapman and Hall/CRC, New York
Google Scholar
Gao G, Tsiatis AA (2005) Semiparametric estimators for the regression coefficients in the linear transformation competing risks model with missing cause of failure. Biometrika 92:875–891
Article MathSciNet Google Scholar
Gilbert P, McKeague I, Sun Y (2008) The 2-sample problem for failure rates depending on a continuous mark: an application to vaccine efficacy. Biostatistics 9(2):263–276
Article Google Scholar
Gilbert P, Sun Y (2015) Inferences on relative failure rates in stratified mark-specific proportional hazards models with missing marks, with application to human immunodeficiency virus vaccine efficacy trials. J R Stat Soc Ser C (Appl Stat) 64(1):49–73
Article MathSciNet Google Scholar
Goetghebeur E, Ryan L (1995) Analysis of competing risks survival data when some failure types are missing. Biometrika 82(4):821–833
Article MathSciNet Google Scholar
Hall P, Racine JS, Li Q (2004) Cross-validation and the estimation of conditional probability densities. J Am Stat Assoc 99:1015–1026
Article MathSciNet Google Scholar
Horvitz D, Thompson D (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685
Article MathSciNet Google Scholar
Hyun S, Lee J, Sun Y (2012) Proportional hazards model for competing risks data with missing cause of failure. J Stat Plann Inference 142:1767–1779
Article MathSciNet Google Scholar
Izbicki R, Lee AB (2016) Nonparametric conditional density estimation in a high-dimensional regression setting. J Comput Gr Stat 25(4):1297–1316
Article MathSciNet Google Scholar
Lin DY, Wei LJ, Ying Z (1993) Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika 80:557–572
Article MathSciNet Google Scholar
Liu L, Nevo D, Nishihara R, Cao Y, Song M, Twombly T, Chan A, Giovannucci E, VanderWeele T, Wang M, Ogino S (2018) Utility of inverse probability weighting in molecular pathological epidemiology. Eur J Epidemiol 33(4):381–392
Article Google Scholar
Lu W, Liang Y (2008) Analysis of competing risks data with missing cause of failure under additive hazards model. Stat Sin 19:219–234
MathSciNet MATH Google Scholar
Lu K, Tsiatis A (2001) Multiple imputation methods for estimating regression coefficients in the competing risks model with missing cause of failure. Biometrics 57(4):1191–1197
Article MathSciNet Google Scholar
Lu K, Tsiatis A (2005) Comparison between two partial likelihood approaches for the competing risks model with missing cause of failure. Lifetime Data Anal 11:29–40
Article MathSciNet Google Scholar
Martinussen T, Scheike TH, Skovgaard IM (2002) Efficient estimation of fixed and time-varying covariates effects in multiplicative intensity models. Scand J Stat 29:59–77
Article MathSciNet Google Scholar
Martinussen T, Scheike T (2006) Dynamic regression models for survival data. Springer, New York
MATH Google Scholar
Murphy SA, Sen PK (1991) Time-dependent coefficients in a Cox-type regression model. Stoch Process Appl 39:153–180
Article MathSciNet Google Scholar
Nevo D, Nishihara R, Ogino S, Wang M (2018) The competing risks Cox model with auxiliary case covariates under weaker missing-at-random cause of failure. Lifetime Data Anal 24:425–442
Article MathSciNet Google Scholar
Rice JA, Silverman BW (1991) Estimating the mean and covariance structure nonparametrically when the data are curves. J R Stat Soc Ser B 53:233–243
MathSciNet MATH Google Scholar
Robins J, Rotnitzky A, Zhao L (1994) Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 89:846–866
Article MathSciNet Google Scholar
Rubin D (1976) Inference and missing data. Biometrika 63:581–592
Article MathSciNet Google Scholar
Scharfstein DO, Rotnitzky A, Robins JM (1999) Adjusting for nonignorable drop-out using semiparametric nonresponse models: rejoinder. J Am Stat Assoc 94:1135–1146
MATH Google Scholar
Sun Y, Gilbert PB (2012) Estimation of stratified mark-specific proportional hazards models with missing marks. Scand J Stat 39:34–52
Article MathSciNet Google Scholar
Sun Y, Hyun S, Gilbert PB (2008) Testing and estimation of time-varying cause-specific hazard ratios with covariate adjustment. Biometrics 64:1070–1079
Article MathSciNet Google Scholar
Sun Y, Qian X, Shou Q, Gilbert P (2017) Analysis of two-phase sampling data with semiparametric additive hazards models. Lifetime Data Anal 23:377–399
Article MathSciNet Google Scholar
Sun Y, Sundaram R, Zhao Y (2009) Empirical likelihood inference for the Cox model with time-dependent coefficients via local partial likelihood. Scand J Stat 36:444–462
Article MathSciNet Google Scholar
Sun Y, Wang H, Gilbert PB (2012) Quantile regression for competing risks data with missing cause of failure. Stat Sin 22:703–728
Article MathSciNet Google Scholar
Sun Y, Wu H (2005) Semiparametric time-varying coefficients regression model for longitudinal data. Scand J Stat 32:21–47
Article MathSciNet Google Scholar
Thior, I., Lockman, S., Smeaton, L.M., Shapiro, R.L., Wester, C., Heymann, S.J., Gilbert, P.B., Stevens, L., Peter, T., Kim, S., van Widenfelt, E., Moffat, C., Ndase, P., Arimi, P., Kebaabetswe, P., Mazonde, P., Makhema, J., McIntosh, K., Novitsky, V., Lee, T.H., Marlink, R., Lagakos, S., Essex M. and the Mashi Study Team (2006) Breastfeeding plus infant zidovudine prophylaxis for 6 months vsformula feeding plus infant zidovudine for 1 month to reducemother-to-child HIV transmission in Botswana: a randomized trial:the Mashi study. J. Am. Stat Medical Assoc 296: 794–805
Tian L, Zucker D, Wei LJ (2005) On the Cox model with time-varying regression coefficients. J Am Stat Assoc 100:172–183
Article MathSciNet Google Scholar
van der Vaart AW (1998) Asymptotic statistics. Cambridge University Press, Cambridge
Book Google Scholar
Zucker DM, Karr AF (1990) Nonparametric survival analysis with time-dependent covariate effects: a penalized partial likelihood approach. Ann Stat 18:329–353
Article MathSciNet Google Scholar

Download references

Acknowledgements

We thank the reviewers for their constructive comments that have improved the contents and the presentation of the paper. The authors also thank the Mashi study team (led by Dr. Max Essex) and the Mashi study participants. Research reported in this publication was partially supported by the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under Award Number R37 AI054165. The research of Yanqing Sun was also partially supported by National Science Foundation grant DMS-1513072 and DMS-1915829. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of North Florida, Jacksonville, FL, 32224, USA
Fei Heng
Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
Yanqing Sun
Division of Mathematics and Computer Science, University of South Carolina Upstate, Spartanburg, SC, 29303, USA
Seunggeun Hyun
University of Washington and Fred Hutchinson Cancer Research Center Seattle, Seattle, WA, 98109, USA
Peter B. Gilbert

Authors

Fei Heng
View author publications
You can also search for this author in PubMed Google Scholar
Yanqing Sun
View author publications
You can also search for this author in PubMed Google Scholar
Seunggeun Hyun
View author publications
You can also search for this author in PubMed Google Scholar
Peter B. Gilbert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanqing Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 341 KB)

Appendices

Appendix

This Appendix introduces the notations and presents the conditions for the asymptotic results presented in Theorems 1–5.

Let $\mathcal {F}_t$ be the right continuous filtration generated by the data processes $\{N_{ik}(s), Y_i(s), Z_i(s); i=1,\dots ,n, k=1,2,\dots ,L, 0 \le s \le t\}$. Assume $E(dN_{ik}(t)=1|\mathcal {F}_{t^-})=E(dN_{ik}(t)=1|Y_i(t), Z_i(t))=Y_i(t)\lambda _{ik}(t|Z_i(t))dt$. It follows that $M_{ik}(t) = N_{ik}(t) -\int _0^t Y_i(u)\lambda _k(u|Z_i(u))du$, $i=1,\dots ,n, k=1,2,\dots ,L$, are multivariate orthogonal martingales with respect to $\mathcal {F}_t$ (Aalen and Johansen 1978). To accommodate additional information introduced due to missing data, we define the augmented filtration $\mathcal {F}_t^*$ generated by the data processes $\{N_{ik}(s), Y_i(s), Z_i(s), R_i, \delta _i A_i; i=1,\dots ,n, k=1,2,\dots ,L, 0 \le s \le t\}$. Let $\lambda _{ik}^*(t)\,dt=P\{T_i\in [t,t+dt),V_i=k|X_i \ge t, Z_i(t), R_i, \delta _i A_i\}$. Then $Y_i(t)\lambda _{ik}^*(t)$ is the intensity of $N_{ik}(t)$ with respect to $\mathcal {F}_t^*$, and $M_{ik}^*(t) = N_{ik}(t) -\int _0^t Y_i(u)\lambda _{ik}^*(u)du$, $i=1,\dots ,n, k=1,2,\dots ,L$, are multivariate orthgonal martingales with respect to $\mathcal {F}_t^*$.

Let $S^{(j)}(t,\beta _k)=n^{-1}\sum _{i=1}^{n} Y_i(t)\exp \left( \beta _k(t)^{{\textsf {T}}}Z_i(t)\right) Z_i(t)^{\otimes j}$, and $S_I^{*(j)}(t,\beta _k,\psi )=n^{-1}\sum _{i=1}^n q_i $$Y_i(t)\exp \left( \beta _k(t)^{{\textsf {T}}}Z_i(t)\right) Z_i(t)^{\otimes j}$, for $k=1,\dots ,L$ and $j=0,1,2$. Let $s^{(j)}(t,\beta _k)=ES^{(j)}(t,\beta _k)$ and $s_I^{*(j)}(t,\beta _k,\psi )=ES_I^{*(j)}(t,\beta _k,\psi )$. If the model $r(\zeta _i, A_i, \psi )$ is correctly specified, then $s^{(j)}(t,\beta _k)$$= s_I^{*(j)}(t,\beta _k,\psi )$. Define $\Sigma _k(t) = \big [s^{(2)}(t,\beta _k) - {\big (s^{(1)}(t,\beta _k)\big )^{\otimes 2}}/$${s^{(0)}(t,\beta _k)}\big ] \lambda _{k0}(t)$ and $\Sigma _k^*(t)=E\big [ \big (Z_i(t)-$${s^{(1)}(t,\beta _k)}/{s^{(0)}(t,\beta _k)} \big )^{\otimes 2}$$ {R_i}{\pi ^{-2}(Q_i)} Y_{i}(t) \lambda _{ik}^*(t) \big ]$.

Let $S_i^{\psi }$ and $I^{\psi }$ be the score vector and information matrix for ${{\widehat{\psi }}}$ under (4). Then,

$$\begin{aligned}&S_i^{\psi } = \frac{\delta _i(R_i-r(\zeta _i, A_i,\psi _0))}{r(\zeta _i, A_i,\psi _0)(1-r(\zeta _i, A_i,\psi _0))} \frac{\partial r(\zeta _i, A_i,\psi _0)}{\partial \psi },\\&I^{\psi } = E\Bigg \{ \frac{\delta _i}{r(\zeta _i, A_i,\psi _0)(1-r(\zeta _i, A_i,\psi _0))} \frac{\partial r(\zeta _i, A_i,\psi _0)}{\partial \psi } \Bigg (\frac{\partial r(\zeta _i, A_i,\psi _0)}{\partial \psi } \Bigg )^{{\textsf {T}}} \Bigg \}, \end{aligned}$$

and ${{\widehat{\psi }}} - \psi = n^{-1}\sum _{i=1}^n(I^{\psi })^{-1}S_i^{\psi }+o_p(n^{-1/2})$, where $\psi _0$ is the true value of $\psi $. We also define the following notations:

$$\begin{aligned}&{\mathcal {A}}_{i}(t,\beta _k)=\int _0^{\tau } K_h(u-t) H^{-1}\Bigg (Z_i(u)-\frac{s^{(1)}(u,\beta _k)}{s^{(0)}(u,\beta _k)} \Bigg ) q_{i0}\ dM_{ik}(u),\\&{\mathcal {B}}_{i}(t,\beta _k)=\int _0^{\tau } K_h(u-t) H^{-1}\Bigg (Z_i(u)-\frac{s^{(1)}(u,\beta _k)}{s^{(0)}(u,\beta _k)} \Bigg )(1-q_{i0})\ E(dM_{ik}(u)|Q_{i}),\\&\mathcal {D}^n(t,\beta _k) = n^{-1}\sum _{i=1}^n \int _0^{\tau }K_h(u-t)\Bigg (Z_i(u)-\frac{s^{(1)}(u,\beta _k)}{s^{(0)}(u,\beta _k)}\Bigg ) \dfrac{-R_i}{(\pi (Q_i,\psi _0))^2}\\&\quad \Bigg (\dfrac{\partial \pi (Q_i,\psi _0)}{\partial \psi }\Bigg )^{\textsf {T}}dM_{ik}(u),\\&\mathcal {O}_i(t,\beta _k) = \mathcal {D}^n(t,\beta _k)(I^{\psi })^{-1}S_i^{\psi }. \end{aligned}$$

The following conditions are assumptions we use to prove the theorems:

(C.1)
For $k=1,\dots ,L$, $\beta _k(t)$ has componentwise second derivatives on $[0,\tau ]$. The sample path of the covariate process $Z_i(t)$ is left continuous and of bounded variation, and satisfies the moment condition $E[||Z_i(t)||^4\exp (2M||Z_i(t)||)]<\infty $, where M is a constant such that $(t,\beta _k(t))\in [0,\tau ]\times [-M,M]^p$ for all t and $||A||=\max _{k,l}|a_{kl}|$ for a matrix $A=(a_{kl})$.
(C.2)
The kernel function $K(\cdot )$ is bounded and symmetric with bounded support $[-1,1]$. The bandwidth h satisfies $nh^2\rightarrow \infty $ and $nh^5$ is bounded as $n \rightarrow \infty $.
(C.3)
The matrix $\Sigma _k(t)$ is positive definite for all $t \in [0, \tau ]$.
(C.4)
For $k=1,\dots ,L$ and for $j=0,1,2$, the functions $s^{(j)}(t,\beta _k)$ and $s_I^{*(j)}(t,\beta _k,\psi )$ are componentwise continuous on $t\in [0,\tau ],\beta _k\in [-M,M]^p,\psi \in \varTheta _{\psi }$, where $\varTheta _{\psi }$ is a compact set. $\sup _{t\in [0,\tau ],\beta _k\in [-M,M]^p}||S^{(j)}(t,\beta _k)-s^{(j)}(t,\beta _k)||=O_p(n^{-1/2})$, and $\sup _{t\in [0,\tau ],\beta _k\in [-M,M]^p,\psi \in \varTheta _{\psi }}$$||S_I^{*(j)}(t,\beta _k,\psi )-s_I^{*(j)}(t,\beta _k,\psi )||=O_p(n^{-1/2}).$
(C.5)
The function $r(\zeta _i,A_i,\psi )$ is twice differentiable with respect to $\psi $ on a compact set $\varTheta _{\psi }$, $r^\prime (\zeta _i,A_i,\psi )=\partial r(\zeta _i,A_i,\psi )/\partial \psi $ is uniformly bounded, and there is an $\varepsilon > 0$ such that $r(\zeta _i,A_i,\psi ) \ge \varepsilon $ for all i. The function $f(A_i |k,T_i,Z_i,\varphi _k)$ is also twice differentiable with respect to $\varphi _k$ on a compact set $\varTheta _{\varphi _k}$ for $k=1,\dots ,L$.

Supplementary materials

The Web-based Supplementary Materials referenced in the manuscript are available with this paper at the journal’s online website.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Heng, F., Sun, Y., Hyun, S. et al. Analysis of the time-varying Cox model for the cause-specific hazard functions with missing causes. Lifetime Data Anal 26, 731–760 (2020). https://doi.org/10.1007/s10985-020-09497-y

Download citation

Received: 22 October 2018
Accepted: 27 March 2020
Published: 09 April 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s10985-020-09497-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of the time-varying Cox model for the cause-specific hazard functions with missing causes

Abstract

Access this article

Similar content being viewed by others

On a Shape-Invariant Hazard Regression Model with application to an HIV Prevention Study of Mother-to-Child Transmission

Bayesian Approach for Joint Modeling Longitudinal Data and Survival Data Simultaneously in Public Health Studies

Bayesian Sensitivity Analysis for Non-ignorable Missing Data in Longitudinal Studies

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 341 KB)

Appendices

Appendix

Supplementary materials

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Analysis of the time-varying Cox model for the cause-specific hazard functions with missing causes

Abstract

Access this article

Similar content being viewed by others

On a Shape-Invariant Hazard Regression Model with application to an HIV Prevention Study of Mother-to-Child Transmission

Bayesian Approach for Joint Modeling Longitudinal Data and Survival Data Simultaneously in Public Health Studies

Bayesian Sensitivity Analysis for Non-ignorable Missing Data in Longitudinal Studies

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 341 KB)

Appendices

Appendix

Supplementary materials

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation