Abstract
Due to the coexistence of ultra-high dimensionality and right censoring, it is very challenging to develop feature screening procedure for ultra-high-dimensional survival data. In this paper, we propose a joint screening approach for the sparse additive hazards model with ultra-high-dimensional features. Our proposed screening is based on a sparsity-restricted pseudo-score estimator which could be obtained effectively through the iterative hard-thresholding algorithm. We establish the sure screening property of the proposed procedure theoretically under rather mild assumptions. Extensive simulation studies verify its improvements over the main existing screening approaches for ultra-high-dimensional survival data. Finally, the proposed screening method is illustrated by dataset from a breast cancer study.
Similar content being viewed by others
References
Annest, A., Bumgarner, R., Raftery, A., Yeung, K. (2009). Iterative Bayesian model averaging: A method for the application of survival analysis to high-dimensional microarray data. BMC Bioinformatics, 10, 72.
Bertsekas, D. (2016). Nonlinear programming (3rd ed.). Nashua: Athena Scientific.
Bickel, P., Ritov, Y., Tsybakov, A. (2009). Simultaneous analysis of lasso and dantzig selector. The Annals of Statistics, 37, 1705–1732.
Bradic, J., Fan, J., Jiang, J. (2011). Regularization for Cox’s proportional hazards model with NP-dimensionality. The Annals of Statistics, 39, 3092–3120.
Cai, J., Fan, J., Li, R., Zhou, H. (2005). Variable selection for multivariate failure time data. Biometrika, 92, 303–316.
Chang, J., Tang, C., Wu, Y. (2013). Marginal empirical likelihood and sure independence feature screening. The Annals of Statistics, 41, 2123–2148.
Chen, X. (2018). Model-free conditional feature screening for ultra-high dimensional right censored data. Journal of Statistical Computation and Simulation. https://doi.org/10.1080/00949655.2018.1466142.
Chen, X., Chen, X., Liu, Y. (2017). A note on quantile feature screening via distance correlation. Statistical Papers. https://doi.org/10.1007/s00362-017-0894-8.
Chen, X., Chen, X., Wang, H. (2018). Robust feature screening for ultra-high dimensional right censored data via distance correlation. Computational Statistics and Data Analysis, 119, 118–138.
Fan, J., Li, R. (2002). Variable selection for Cox’s proportional hazards model and frailty model. The Annals of Statistics, 30, 74–99.
Fan, J., Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space (with discussion). Journal of Royal Statistical Society, Series B, 70, 849–911.
Fan, J., Song, R. (2010). Sure independence screening in generalized linear models with NP-dimensionality. The Annals of Statistics, 38, 3567–3604.
Fan, J., Feng, Y., Wu, Y. (2010). Ultrahigh dimensional variable selection for Cox’s proportional hazards model. Institute of Mathematical Statistics Collections, 6, 70–86.
Fan, J., Samworth, R., Wu, Y. (2009). Ultrahigh dimensional variable selection: Beyond the linear model. Journal of Machine Learning Research, 10, 1829–1853.
Fan, J., Ma, Y., Dai, W. (2014). Nonparametric independent screening in sparse ultra-high dimensional varying coefficient models. Journal of the American Statistical Association, 109, 1270–1284.
Gorst-Rasmussen, A., Scheike, T. (2013). Independent screening for single-index hazard rate models with ultrahigh dimensional features. Journal of Royal Statistical Society, Series B, 75, 217–245.
He, X., Wang, L., Hong, H. (2013). Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. The Annals of Statistics, 41, 342–369.
Huang, J., Sun, T., Ying, Z., Yu, Y., Zhang, C. (2013). Oracle inequalities for the lasso in the Cox model. The Annals of Statistics, 41, 1142–1165.
Leng, C., Ma, S. (2007). Path consistent model selection in additive risk model via lasso. Statistics in Medicine, 26, 3753–3770.
Li, G., Peng, H., Zhang, J., Zhu, L. (2012a). Robust rank correlation based screening. The Annals of Statistics, 40, 1846–1877.
Li, R., Zhong, W., Zhu, L. (2012b). Feature screening via distance correlation learning. Journal of the American Statistical Association, 107, 1129–1139.
Lin, D., Ying, Z. (1994). Semiparametric analysis of the additive risk model. Biometrika, 81, 61–71.
Lin, W., Lv, J. (2013). High-dimensional sparse additive hazards regression. Journal of the American Statistical Association, 108, 247–264.
Liu, Y., Chen, X. (2018). Quantile screening for ultra-high-dimensional heterogeneous data conditional on some variables. Journal of Statistical Computation and Simulational, 88, 329–342.
Martinussen, T., Scheike, T. (2009). The additive hazards model with high-dimensional regressors. The Annals of Statistics, 15, 330–342.
Song, R., Lu, W., Ma, S., Jessie Jeng, X. (2014). Censored rank independence screening for high-dimensional survival data. Biometrika, 101, 799–814.
Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Statistics in Medicine, 16, 385–395.
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., et al. (2001). Missing value estimation methods for DNA microarray. Bioinformatics, 17, 520–525.
van’t Veer, L., Dai, H., van de Vijver, M., He, Y., Hart, A., Mao, M., et al. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530–536.
Wu, Y., Yin, G. (2015). Conditional quantile screening in ultrahigh-dimensional heterogeneous data. Biometrika, 102, 65–76.
Xu, C., Chen, J. (2014). The sparse MLE for ultra-high-dimensional feature screening. Journal of the American Statistical Association, 109, 1257–1269.
Yang, G., Yu, Y., Li, R., Buu, A. (2016). Feature screening in ultrahigh dimensional Cox’s model. Statistics Sinica, 26, 881–901.
Yang, G., Hou, S., Wang, L., Sun, Y. (2018). Feature screening in ultrahigh-dimensional additive Cox model. Journal of Statistical Computation and Simulation, 88, 1117–1133.
Zhang, C., Zhang, T. (2012). A general theory of concave regularization for high-dimensional sparse estimation problems. Statistical Science, 27, 576–593.
Zhang, J., Liu, Y., Wu, Y. (2017). Correlation rank screening for ultrahigh-dimensional survival data. Computational Statistics & Data Analysis, 108, 121–132.
Zhao, S., Li, Y. (2012). Principled sure independence screening for Cox model with ultra-high-dimensional covariates. Journal of Multivariate Analysis, 105, 397–411.
Zhao, S., Li, Y. (2014). Score test variable screening. Biometrics, 70, 862–871.
Zhou, T., Zhu, L. (2017). Model-free features screening for ultrahigh dimensional censored regression. Statistics and Computing, 27, 947–961.
Zhu, L., Li, L., Li, R., Zhu, L. (2011). Model-free feature screening for ultrahigh-dimensional data. Journal of the American Statistical Association, 106, 1464–1475.
Acknowledgements
Chen’s research was supported by the National Natural Science Foundation of China (11501573, 11326184 and 11771250) and and National Social Science Foundation of China (17BTJ019). Liu’s research was supported by the Fundamental Research Funds for the Central Universities (17CX02035A). Wang’s research was supported by the National Natural Science Foundation of China (General program 11171331, Key program 11331011 and program for Creative Research Group in China 61621003 ), a grant from the Key Lab of Random Complex Structure and Data Science, CAS and a grant from Zhejiang Gongshang University.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proofs of the theorems
Appendix: Proofs of the theorems
To prove Theorem 1, we firstly give a large deviation result for martingale under the additive hazards model. This result could be proved along the similar line as those for Theorem 3.1 in Bradic et al. (2011). So we omit the proof here for simplicity.
Lemma 1
Under Assumptions 1–3, for any positive sequence \(\{u_{n}\}\) bounded away from zero, if \(\max \limits _{1 \le j\le p}\sigma _{j}^{2}=O(u_{n})\), there exist positive constants \(c_{7}\) and \(c_{8}\) such that
uniformly over j, where \(U_{n,j}(\varvec{\beta }^{*})\) is the jth component of \(\varvec{U}_{n}(\varvec{\beta }^{*})\).
This large deviation result represents a uniform, nonasymptotic exponential inequality for martingales under the additive hazards model and is independent of dimensionality p. So it will be very useful for the high-dimensional additive hazards model.
Proof of Theorem 1
Denote \(\hat{\varvec{\beta }}_{M}\) to be the (unrestricted) pseudo-score estimator of \(\varvec{\beta }\) based on model M. In order to establish the sure screening property, we just need to prove
as n goes to \(\infty \). It suffices to show
as n goes to \(\infty \).
For any \(M\in {\mathbf {M}}_{-}^{k}\), let \(M^{\prime }=M\cup M_{0}\in {\mathbf {M}}_{+}^{2k}\).
Firstly, consider \(\varvec{\beta }_{M^{\prime }}\) being close to \(\varvec{\beta }_{M^{\prime }}^{*}\) such that \(\Vert \varvec{\beta }_{M^{\prime }}-\varvec{\beta }_{M^{\prime }}^{*}\Vert _{2}=c_{2}n^{-\tau _{1}}\). After some algebraic manipulations, we have
Then, by the Cauchy–Schwartz inequality and Assumption 5, we can conclude that
Thus, we have
where the second inequality is obtained by Bonferroni inequality.
Because \(M_{0}\subset M'\), we can get that \(U_{n,j}(\varvec{\beta }_{M^{\prime }}^{*})=U_{n,j}(\varvec{\beta }^{*})\). Then under the conditions in Theorem 1 and by Lemma 1, we have
Then
Then, by the Bonferroni inequality and assumptions in Theorem 1, we can arrive at
where \(c_{9}\) is a positive constant.
By the concavity of \(L_{n}(\varvec{\beta }_{M^{\prime }})\), we can conclude that the above result holds for any \(\varvec{\beta }_{M^{\prime }}\) that \(\Vert \varvec{\beta }_{M^{\prime }}-\varvec{\beta }_{M^{\prime }}^{*}\Vert \ge c_{2}n^{-\tau _{1}}\).
For any \(M\in {\mathbf {M}}_{-}^{k}\), let \(\check{\varvec{\beta }}_{M^{\prime }}\) being \(\hat{\varvec{\beta }}_{M}\) augmented with zeros corresponding to the elements in \(M^{\prime }/M_{0}\). It is easy to see that \(\Vert \check{\varvec{\beta }}_{M^{\prime }}-\varvec{\beta }_{M^{\prime }}^{*}\Vert \ge \Vert \varvec{\beta }_{M_{0}/M}^{*}\Vert \ge c_{2}n^{-\tau _{1}}\). So we have
Then the proof is finished. \(\square \)
Proof of Theorem 2
Denote \(Q_{n}(\varvec{\beta }\mid \varvec{{\hat{\beta }}}^{(t)})=L_{n}(\varvec{{\hat{\beta }}}^{(t)})+ (\varvec{\beta }-\varvec{{\hat{\beta }}}^{(t)})^{T}\varvec{U}_{n}(\varvec{{\hat{\beta }}}^{(t)}) -\frac{u}{2}\Vert \varvec{\beta }-\varvec{{\hat{\beta }}}^{(t)}\Vert _{2}^{2}\). Then \( \varvec{{\hat{\beta }}}^{(t+1)}=\mathop {\mathrm {argmin}}_{\varvec{\beta }\in {\mathcal {B}}(k)} \{-Q_{n}(\varvec{\beta }\mid \varvec{{\hat{\beta }}}^{(t)})\}. \)
After some algebraic manipulations, it is easy to see that
It is easy to see that
So under the assumptions in Theorem 2, we have
This ends up the proof. \(\square \)
Before presenting the proof of Theorem 2, let’s introduce a lemma firstly.
Lemma 2
Define \(\varvec{{\hat{\beta }}}^{(0)}=\mathrm {argmax}_{\varvec{\beta }}\{L_{n}(\varvec{\beta })-\lambda \Vert \varvec{\beta }\Vert _{1}\}\), where \(\lambda \) satisfies \(\lambda n^{\frac{1}{2}-m}\rightarrow \infty \), \(\lambda n^{\tau _{1}+\tau _{2}}\rightarrow 0\). Under Assumptions 1–3 and 6, if \(\max \limits _{1 \le j\le p}\sigma _j^2=O(\lambda n^{\frac{1}{2}})\), we have
where \(c_{5}\) is defined in Assumption 6.
Proof
It is easy to see that
or equivalently
Define \(\varvec{\delta }=(\varvec{{\hat{\beta }}}^{(0)}-\varvec{\beta }^{*})=(\delta _{1},\ldots ,\delta _{p})^{T}\). By some algebraic manipulations, we have
Then we have
Denote \({\mathcal {A}}=\{\max \nolimits _{1 \le j\le p}|U_{n,j}(\varvec{\beta }^{*})|\le \frac{\lambda }{4}\}\). Because \(\max \limits _{1 \le j\le p}\sigma _{j}^{2}=O(\lambda n^{\frac{1}{2}})\), then by Lemma 1, we have
where \(c_{10}\) is a positive constant. So we obtain that \(\mathrm {pr}({\mathcal {A}})\rightarrow 1\) and \(\Vert \varvec{U}_{n}(\varvec{\beta }^{*})\Vert _{\infty }=O_{p}(\lambda )\). Under the event \({\mathcal {A}}\), it is easy to see that
Thus
It is easy to see that \(\varvec{V}_{n}\) is semipositive definite. Thus \(\Vert \varvec{\delta }\Vert _{1}\le 4\Vert \varvec{\delta }_{M_{0}}\Vert _{1}\), and furthermore \(\Vert \varvec{\delta }_{M_{0}^{c}}\Vert _{1}\le 3\Vert \varvec{\delta }_{M_{0}}\Vert _{1}\). By the Cauchy–Schwarz inequality and Assumption 6,
So \(\Vert \varvec{\delta }_{M_{0}}\Vert _{1}\le 2 c_{5}^{-1}\lambda q\). Then finally we arrive at
This finishes the proof. \(\square \)
Proof of Theorem 3
Recall that \(w=\mathrm {min}_{j\in M_{0}}\Vert \beta _{j}^{*}\Vert \). We just need to show \(\mathrm {pr}(\Vert \varvec{{\hat{\beta }}}^{(t)}-\varvec{\beta }^{*}\Vert _{\infty }<\frac{w}{2})\rightarrow 1\). It suffices to prove \(\Vert \varvec{{\hat{\beta }}}^{(t)}-\varvec{\beta }^{*}\Vert _{\infty }=o_{p}(w)\). As in Xu and Chen (2014), we use the method of mathematical induction to get this result.
When \(t=0\), by Lemma 2, we have
Because \(\lambda =o(n^{-(\tau _{1}+\tau _{2})})\), \(q=O(n^{\tau _{2}})\), \(w^{-1}=O(n^{\tau _{1}})\), \(\lambda qw^{-1}=o(n^{-(\tau _{1}+\tau _{2})})O(n^{\tau _{2}})O(n^{\tau _{1}})=o(1)\). Thus \(\lambda q=o(w)\). So we have \(\Vert \varvec{{\hat{\beta }}}^{(0)}-\varvec{\beta }^{*}\Vert _{1}=o_{p}(w)\). It is noted that \(\Vert \varvec{{\hat{\beta }}}^{(0)}-\varvec{\beta }^{*}\Vert _{\infty }\le \Vert \varvec{{\hat{\beta }}}^{(0)}-\varvec{\beta }^{*}\Vert _{1}\). Then the desired result is obtained for \(t=0\).
Suppose that \(\Vert \varvec{{\hat{\beta }}}^{(t-1)}-\varvec{\beta }^{*}\Vert _{\infty }=o_{p}(w)\). In the following, we will show that \(\Vert \varvec{{\hat{\beta }}}^{(t)}-\varvec{\beta }^{*}\Vert _{\infty }=o_{p}(w)\) is also true. From the adaptive iterative hard-thresholding algorithm, it is noted that \(\varvec{{\hat{\beta }}}^{(t)}={\mathbf {H}}(\varvec{{\tilde{\beta }}}^{(t-1)};k)\), where \(\varvec{{\tilde{\beta }}}^{(t-1)}=\varvec{{\hat{\beta }}}^{(t-1)}+u^{-1}{\dot{L}}_{n}(\varvec{{\hat{\beta }}}^{(t-1)})\). If \(\Vert \varvec{{\tilde{\beta }}}^{(t-1)}-\varvec{\beta }^{*}\Vert _{\infty }=o_{p}(w)\) holds, it can be seen that elements of \(\varvec{{\tilde{\beta }}}^{(t-1)}_{M_{0}}\) are among the ones with top k largest absolute values in probability. Thus \(\Vert \varvec{{\hat{\beta }}}^{(t)}-\varvec{\beta }^{*}\Vert _{\infty }\le \Vert \varvec{{\tilde{\beta }}}^{(t-1)}-\varvec{\beta }^{*}\Vert _{\infty }=o_{p}(w)\). So what remains is to prove \(\Vert \varvec{{\tilde{\beta }}}^{(t-1)}-\varvec{\beta }^{*}\Vert _{\infty }=o_{p}(w)\). Note that \(\Vert \varvec{{\tilde{\beta }}}^{(t-1)}-\varvec{\beta }^{*}\Vert _{\infty }\le \Vert \varvec{{\hat{\beta }}}^{(t-1)}-\varvec{\beta }^{*}\Vert _{\infty }+ u^{-1}\Vert \varvec{U}_{n}(\varvec{{\hat{\beta }}}^{(t-1)})\Vert _{\infty }\). By some algebraic manipulations, we could obtain that
Thus
This ends up the proof. \(\square \)
About this article
Cite this article
Chen, X., Liu, Y. & Wang, Q. Joint feature screening for ultra-high-dimensional sparse additive hazards model by the sparsity-restricted pseudo-score estimator. Ann Inst Stat Math 71, 1007–1031 (2019). https://doi.org/10.1007/s10463-018-0675-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-018-0675-8