An improved variable selection procedure for adaptive Lasso in high-dimensional survival analysis

He, Kevin; Wang, Yue; Zhou, Xiang; Xu, Han; Huang, Can

doi:10.1007/s10985-018-9455-2

An improved variable selection procedure for adaptive Lasso in high-dimensional survival analysis

Original Paper
Published: 26 November 2018

Volume 25, pages 569–585, (2019)
Cite this article

Lifetime Data Analysis Aims and scope Submit manuscript

Kevin He ORCID: orcid.org/0000-0002-8354-426X¹,
Yue Wang²,
Xiang Zhou¹,
Han Xu² &
…
Can Huang²

1122 Accesses
10 Citations
Explore all metrics

Abstract

Motivated by high-dimensional genomic studies, we develop an improved procedure for adaptive Lasso in high-dimensional survival analysis. The proposed procedure effectively reduces the false discoveries while successfully maintaining the false negative proportions, which improves the existing adaptive Lasso procedures. The implementation of the proposed procedure is straightforward and it is sufficiently flexible to accommodate large-scale problems where traditional procedures are impractical. To quantify the uncertainty of variable selection and control the family-wise error rate, a multiple sample-splitting based testing algorithm is developed. The practical utility of the proposed procedure are examined through simulation studies. The methods developed are then applied to a multiple myeloma data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Conditional screening for ultra-high dimensional covariates with survival outcomes

Article 08 December 2016

Controlling the false discovery rate by a Latent Gaussian Copula Knockoff procedure

Article 25 March 2023

A three-stage approach to identify biomarker signatures for cancer genetic data with survival endpoints

Article Open access 27 March 2024

References

Alexande DH, Lange K (2011) Stability selection for genome-wide association. Genet Epidemiol 35(7):722–728
Article Google Scholar
Bataille R, Grenier J, Sany J (1984) Beta-2-microglobulin in myeloma: optimal use for staging, prognosis, and treatment-a prospective study of 160 patients. Blood 63(2):468–476
Google Scholar
Bühlmann P, van de Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, Berlin
Book MATH Google Scholar
Chapman MA, Lawrence MS, Keats JJ, Cibulskis K, Sougnez C, Schinzel AC, Golub TR (2011) Initial genome sequencing and analysis of multiple myeloma. Nature 471(7339):467–472
Article Google Scholar
Di Luccio E (2015) Inhibition of nuclear receptor binding SET domain 2/multiple myeloma SET domain by LEM-06 implication for epigenetic cancer therapies. J Cancer Prev 20(2):113–120
Article Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Article MathSciNet MATH Google Scholar
Fan J, Li R (2002) Variable selection for Cox’s proportional hazards model and frailty model. Ann Stat 30(1):74–99
Article MathSciNet MATH Google Scholar
Geoman JJ (2010) L1 penalized estimation in the Cox proportional hazards model. Biom J 52(1):70–84
MathSciNet Google Scholar
Gui J, Li H (2005) Penalized cox regression analysis in the high-dimensional and low-sample size settings with application to microarray gene expression data. Bioinformatics 21(13):3001–3008
Article Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, New York
Book MATH Google Scholar
Heagerty PJ, Zheng Y (2005) Survival model predictive accuracy and ROC curves. Biometrics 61(1):92105
Article MathSciNet Google Scholar
Kyle RA, Rajkuma SV (2008) Multiple myeloma. Blood 111(6):2962–2972
Article Google Scholar
MAQC Consortium (2010) The MAQC-II project: a comprehensive study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol 28(8):827–838
Article Google Scholar
Meinshausen N, Meier L, Bühlmann P (2009) P-values for high-dimensional regression. J Am Stat Assoc 104(488):1671–1681
Article MathSciNet MATH Google Scholar
Shaughnessy JD, Zhan F, Burington BE, Huang Y, Colla S, Hanamura I, Stewart JP, Kordsmeier B, Randolph C, Williams DR, Xiao Y, Xu H, Epstein J, Anaissie E, Krishna SG, Cottler-Fox M, Hollmig K, Mohiuddin A, Pineda-Roman M, Tricot G, van Rhee F, Sawyer J, Alsayed Y, Walker R, Zangari M, Crowley J, Barlogie B (2007) A validated gene expression model of high-risk multiple myeloma is defined by deregulated expression of genes mapping to chromosome 1. Blood 109(6):2276–2284
Article Google Scholar
Simon N, Friedman J, Hastie T, Tibshirani R (2011) Regularization paths for Cox’s proportional hazards model via coordinate descent. J Stat Softw 39(5):1–13
Article Google Scholar
Song LL, Ponomareva L, Shen H, Duan X, Alimirah F, Choubey D (2010) Interferon-inducible IFI16, a negative regulator of cell growth, down-regulates expression of human telomerase reverse transcriptase (hTERT) gene. PLOS ONE 5(1):e8569
Article Google Scholar
Sun S, Hood M, Scott L, Peng Q, Mukherjee S, Tung J, Zhou X (2017) Differential expression analysis for RNAseq using Poisson mixed models. Nucleic Acids Res 45(11):e106
Article Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288
MathSciNet MATH Google Scholar
Tibshirani R (1997) The lasso method for variable selection in the Cox model. Stat Med 16(4):385–395
Article Google Scholar
Uno H, Cai T, Pencina MJ, D‘gostino RB, Wei LJ (2011) On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med 30(10):1105–1117
MathSciNet Google Scholar
Zhang H, Lu W (2007) Adaptive Lasso for Cox’s proportional hazards model. Biometrika 94(3):691–703
Article MathSciNet MATH Google Scholar
Zhao DS, Li Y (2014) Score test variable screening. Biometrics 70(4):862–871
Article MathSciNet MATH Google Scholar
Zhou SH, van de Geer S, Bühlmann P (2009) Adaptive Lasso forhigh dimensional regression and Gaussian graphical modeling. arXiv:0903.2515
Zou H, Hastie T (2005) Regression shrinkage and selection via the elastic net with application to microarrays. J R Stat Soc Ser B (Methodol) 67(2):301–320
Article MathSciNet MATH Google Scholar
Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509–1533
Article MathSciNet MATH Google Scholar
Zou H, Zhang HH (2009) On the adaptive elastic-net with a diverging number of parameters. Ann Stat 37(4):1733–1751
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biostatistics, University of Michigan, 1420 Washington Hts, Ann Arbor, MI, 48109-2029, USA
Kevin He & Xiang Zhou
Department of Statistics, University of Michigan, 1085 South University, Ann Arbor, MI, 48109-2029, USA
Yue Wang, Han Xu & Can Huang

Authors

Kevin He
View author publications
You can also search for this author in PubMed Google Scholar
Yue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Han Xu
View author publications
You can also search for this author in PubMed Google Scholar
Can Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kevin He.

Appendix: FWER-control procedure in Sect. 3.3

(a)
Randomly split the original data multiple times (say B). Specifically, for $b=1, \ldots , B$, split the data into two disjoint sets with sample size $n_1=\lfloor n/2 \rfloor $ and $n_2=n-\lfloor n/2 \rfloor $, respectively. Here $\lfloor n/2 \rfloor $ is defined as the largest integer not greater than n / 2.
(b)
For $b=1, \ldots , B$, select variables based on the first half of the data and denote the index set of selected variables by ${\widehat{{{\mathcal {S}}}}}^{(b)}.$
(c)
Based on the second half of the data, fit conventional Cox model and assign p-values. denoted by ${\tilde{P}}_{j}$ for $j=1,\dots ,p$, using variables selected from step (b). For variables not selected from the first half of the data, assign their p-values as 1.
(d)
Compute adjusted p-values to correct for the multiplicity of the testing problem
$$\begin{aligned} {\tilde{P}}_{corrected,j} = min({\tilde{P}}_{j} |{\widehat{{{\mathcal {S}}}}}^{(b)}|,1), \end{aligned}$$
where $|\widehat{{\mathcal {S}}}^{(b)}|$ is the cardinality, e.g., number of variables in $\widehat{{\mathcal {S}}}^{(b)}$.
(e)
To aggregate the adjusted p-values over multiple splitting (e.g., B values for each covariate), define
$$\begin{aligned} Q_{j}(\gamma ) = min \{q_{\gamma }( \{ {\tilde{P}}^{[b]}_{corrected,j}/\gamma ; b=1,\ldots ,B \}),1 \} \end{aligned}$$
where $\gamma \in (0,1)$ and $q_{\gamma }$ is the emperical $\gamma $-quantile function. Define the final p-values as
$$\begin{aligned} P_{j} = min \{(1-log \gamma _{min}) \underset{\gamma \in (\gamma _{min},1)}{\inf } Q_{j}(\gamma ),1 \}, \end{aligned}$$
where $\gamma _{min} \in (0,1)$ is a lower bound for $\gamma $, typically 0.05 (Meinshausen et al. 2009).

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, K., Wang, Y., Zhou, X. et al. An improved variable selection procedure for adaptive Lasso in high-dimensional survival analysis. Lifetime Data Anal 25, 569–585 (2019). https://doi.org/10.1007/s10985-018-9455-2

Download citation

Received: 14 December 2017
Accepted: 14 November 2018
Published: 26 November 2018
Issue Date: 15 July 2019
DOI: https://doi.org/10.1007/s10985-018-9455-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An improved variable selection procedure for adaptive Lasso in high-dimensional survival analysis

Abstract

Access this article

Similar content being viewed by others

Conditional screening for ultra-high dimensional covariates with survival outcomes

Controlling the false discovery rate by a Latent Gaussian Copula Knockoff procedure

A three-stage approach to identify biomarker signatures for cancer genetic data with survival endpoints

References

Author information

Authors and Affiliations

Corresponding author

Appendix: FWER-control procedure in Sect. 3.3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An improved variable selection procedure for adaptive Lasso in high-dimensional survival analysis

Abstract

Access this article

Similar content being viewed by others

Conditional screening for ultra-high dimensional covariates with survival outcomes

Controlling the false discovery rate by a Latent Gaussian Copula Knockoff procedure

A three-stage approach to identify biomarker signatures for cancer genetic data with survival endpoints

References

Author information

Authors and Affiliations

Corresponding author

Appendix: FWER-control procedure in Sect. 3.3

Appendix: FWER-control procedure in Sect. 3.3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation