Variational discriminant analysis with variable selection

Yu, Weichang; Ormerod, John T.; Stewart, Michael

doi:10.1007/s11222-020-09928-8

Variational discriminant analysis with variable selection

Published: 19 February 2020

Volume 30, pages 933–951, (2020)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

498 Accesses
3 Citations
3 Altmetric
Explore all metrics

Abstract

A fast Bayesian method that seamlessly fuses classification and hypothesis testing via discriminant analysis is developed. Building upon the original discriminant analysis classifier, modelling components are added to identify discriminative variables. A combination of cake priors and a novel form of variational Bayes we call reverse collapsed variational Bayes gives rise to variable selection that can be directly posed as a multiple hypothesis testing approach using likelihood ratio statistics. Some theoretical arguments are presented showing that Chernoff-consistency (asymptotically zero type I and type II error) is maintained across all hypotheses. We apply our method on some publicly available genomics datasets and show that our method performs well in practice for its computational cost. An R package VaDA has also been made available on Github.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Matrix-Variate Discriminative Analysis, Integrative Hypothesis Testing, and Geno-Pheno A5 Analyzer

Variable selection in discriminant analysis for mixed continuous-binary variables and several groups

Article 21 September 2018

Discriminant analysis with Gaussian graphical tree models

Article 01 October 2015

References

Ahdesmäki, M., Strimmer, K.: Feature selection in omics prediction problems using CAT score and false discovery rate control. Ann. Appl. Stat. 4(1), 503–519 (2010)
MathSciNet MATH Google Scholar
Alizadeh, A., Eisen, M., Davis, R., Ma, C., Lossos, I., Rosenwald, A., Boldrick, J., Sabet, H., Tran, T., Yu, X., Powell, J., Yang, L., Marti, G., Moore, T., Hudson, J.J., Lu, L., Lewis, D., Tibshirani, R., Sherlock, G., Chan, W., Greiner, T., Weisenburger, D., Armitage, J., Warnke, R., Levy, R., Wilson, W., Grever, M., Byrd, J., Botstein, D., Brown, P., Staudt, L.: Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)
Google Scholar
Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., AJ, L.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96(12), 6745–6750 (1999)
Google Scholar
Benjamini, Y., Daniel, Y.: The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29(4), 1165–1188 (2001)
MathSciNet MATH Google Scholar
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57(1), 289–300 (1995)
MathSciNet MATH Google Scholar
Bickel, P.J., Levina, E.: Some theory for Fisher’s linear discriminant function, ‘Naïve Bayes’ and some alternatives when there are many more variables than observations. Bernoulli 10(6), 989–1010 (2004)
MathSciNet MATH Google Scholar
Blei, D.M., Jordan, M.I.: Variational inference for Dirichlet processes. Bayesian Anal. 1(1), 121–144 (2006)
MathSciNet MATH Google Scholar
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)
MathSciNet Google Scholar
Bonferroni, C.E.: Teoria statistica delle classi e calcolo delle probabilità. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze (1936)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
MATH Google Scholar
Cai, T., Liu, W.: A direct estimation approach to sparse linear discriminant analysis. J. Am. Stat. Assoc. 106(496), 1566–1577 (2011)
MathSciNet MATH Google Scholar
Carvalho, C.M., Polson, N.G., Scott, J.G.: The horseshoe estimator for sparse signals. Biometrika 97(2), 465–480 (2010)
MathSciNet MATH Google Scholar
Chen, Y., Feng, J.: Efficient method for Moore–Penrose inverse problems involving symmetric structure based on group theory. J. Comput. Civ. Eng. 28(2), 182–190 (2014)
Google Scholar
Chicco, D.: Ten quick tips for machine learning in computational biology. BioData Min. 10(35), 1–17 (2017)
Google Scholar
Clemmensen, L.: On Discriminant Analysis Techniques and Correlation Structures in High Dimensions. Technical report-2013(4). Technical University of Denmark (DTU), Kgs. Lyngby (2013)
Clemmensen, L., Kuhn, M.: sparseLDA: Sparse Discriminant Analysis. R package version 0.1-9 (2016)
Clemmensen, L., Witten, D., Hastie, T., Ersboll, B.: Sparse discriminant analysis. Technometrics 53(4), 406–413 (2011)
MathSciNet Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Courrieu, P.: Fast computation of Moore–Penrose inverse matrices. Neural Inf. Process. Lett. Rev. 8(2), 25–29 (2005)
Google Scholar
Craig-Shapiro, R., Kuhn, M., Xiong, C., Pickering, E.H., Liu, J., Misko, T.P., Perrin, R.J., Bales, K.R., Soares, H., Fagan, A.M., David, M.H.: Multiplexed immunoassay panel identifies novel CSF biomarkers for Alzheimer’s disease diagnosis and prognosis. PLoS ONE 6, e18850 (2011)
Google Scholar
Donoho, D., Jin, J.: Higher criticism for detecting sparse heterogeneous mixtures. Ann. Stat. 32(3), 962–994 (2004)
MathSciNet MATH Google Scholar
Donoho, D., Jin, J.: Higher criticism thresholding. optimal feature selection when useful features are rare and weak. Proc. Natl. Acad. Sci. 105(39), 14790–14795 (2008)
MATH Google Scholar
Duarte Silva, P.A.: Two group classification with high-dimensional correlated data: a factor model approach. Comput. Stat. Data Anal. 55(11), 2975–2990 (2011)
MathSciNet MATH Google Scholar
Duarte Silva, P.A.: HiDimDA: High Dimensional Discriminant Analysis. R package version 0.2-4 (2015)
Dudoit, S., Fridyland, J., Speed, T.P.: Comparison of discrimination methods for classification of tumours using gene expression data. J. Am. Stat. Assoc. 97(457), 77–87 (2002)
MATH Google Scholar
Eddelbuettel, D.: Seamless R and C++ Integration with Rcpp. Springer, Berlin (2013)
MATH Google Scholar
Erickson, B.J., Kirk, S., Lee, Y., Bathe, O., Kearns, M., Gerdes, C., Rieger-Christ, K., Lemmerman, J.: Radiology data from The Cancer Genome Atlas Liver Hepatocellular Carcinoma [TCGA-LIHC] collection. The Cancer Imaging Archive (2016). http://doi.org/10.7937/K9/TCIA.2016.IMMQW8UQ
Fan, J., Fan, Y.: High-dimensional classification using features annealed independence rules. Ann. Stat. 36(6), 2605–2637 (2008)
MathSciNet MATH Google Scholar
Fan, J., Lv, J.: A selective overview of variable selection in high dimensional feature space. Stat. Sin. 20(1), 101–148 (2010)
MathSciNet MATH Google Scholar
Fernández-Delgado, M., Cernadas, E., Barro, S.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. 15, 3133–3181 (2014)
MathSciNet MATH Google Scholar
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1936)
Google Scholar
Fisher, T., Sun, X.: Improved stein-type shrinkage estimators for the high-dimensional multivariate normal covariance matrix. Comput. Stat. Data Anal. 55(1), 1909–1918 (2011)
MathSciNet MATH Google Scholar
Friedman, J.H.: Regularized discriminant analysis. J. Am. Stat. Assoc. 84(405), 165–175 (1989)
MathSciNet Google Scholar
Friguet, C., Kloareg, M., Causeur, D.: A factor model approach to multiple testing under dependence. J. Am. Stat. Assoc. 104(488), 1406–1415 (2009)
MathSciNet MATH Google Scholar
Genuer, R., Poggi, J., Tuleau-Malot, C., Villa-Vialaneix, N.: Random forests for big data. Big Data Res. 9, 28–46 (2017)
Google Scholar
Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Google Scholar
Guo, Y., Hastie, T., Tibshirani, R.: Regularized linear discriminant analysis and its application in microarrays. Biostatistics 8(1), 86–100 (2007)
MATH Google Scholar
Guo, Y., Hastie, T., Tibshirani, R.: RDA: Shrunken Centroids Regularized Discriminant Analysis. R package version 1.0.2-2.1 (2018)
Hastie, T., Tibshirani, R., Narasimhan, B., Chu, G.: PAMR: Prediction Analysis for Microarrays. R package version 1.55 (2014)
Helleputte, T.: LiblineaR: Linear Predictive Models Based on the LIBLINEAR C/C++ Library. R package version 2.10-8 (2017)
Jordan, M.I.: On statistics, computation and scalability. Bernoulli 19(4), 1378–1390 (2013)
MathSciNet MATH Google Scholar
Jorissen, R.N., Lipton, L., Gibbs, P., Chapman, M., Desai, J., Jones, I.T., Yeatman, T.J., East, P., Tomlinson, I.P., Verspaget, H.W., Aaltonen, L.A., Kruhoffer, M., Orntoft, T.F., Andersen, C.L., Sieber, O.M.: DNA copy-number alterations underlie gene expression differences between microsatellite stable and unstable colorectal cancers. Clin. Cancer Res. 14(24), 8061–8069 (2008)
Google Scholar
Kuhn, M., Wing, J., Weston, S., Williams, A., Keefer, C., Engelhardt, A., Cooper, T., Mayer, Z., Kenkel, B., Benesty, M., Lescarbeau, R., Ziem, A., Scrucca, L., Tang, Y., Candan, C., Hunt, T.: Caret: Classification and Regression Training. R package version 6.0-84 (2019)
Lim, T.S., Loh, W.Y., Shih, Y.S.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach. Learn. 40, 203–229 (2000)
MATH Google Scholar
Liu, J.J., Cutler, G., Li, W., Pan, Z., Peng, S., Hoey, T., Chen, L., Ling, X.B.: Multiclass cancer classification and biomarker discovery using GA-based algorithms. Bioinformatics 21(11), 2691–2697 (2005)
Google Scholar
Luts, J., Ormerod, J.T.: Mean field variational Bayesian inference for support vector machine classification. Comput. Stat. Data Anal. 73, 163–176 (2014)
MathSciNet MATH Google Scholar
Mai, Q., Zou, H., Yuan, M.: A direct approach to sparse discriminant analysis in ultra-high dimensions. Biometrika 99(1), 29–42 (2012)
MathSciNet MATH Google Scholar
Mai, Q., Yang, Y., Zou, H.: Multiclass sparse discriminant analysis. arXiv (2015)
Marks, S., Dunn, O.: Discriminant functions when covariance matrices are unequal. J. Am. Stat. Assoc. 69(346), 555–559 (1974)
MATH Google Scholar
Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta (BBA) Protein Struct. 405(2), 442–451 (1975)
Google Scholar
Ormerod, J.T., Wand, M.P.: Explaining variational approximations. Am. Stat. 64(2), 140–153 (2010)
MathSciNet MATH Google Scholar
Ormerod, J.T., Stewart, M., Yu, W., Romanes, S.: Bayesian hypothesis test with diffused priors: Can we have our cake and eat it too? arXiv (2017)
Pan, R., Wang, H.O., Li, R.: Ultrahigh-dimensional multiclass linear discriminant analysis by pairwise sure independence screening. J. Am. Stat. Assoc. 111(513), 169–179 (2016)
MathSciNet Google Scholar
Perthame, E., Friguet, C., Causeur, D.: Stability of feature selection in classification issues for high-dimensional correlated data. Stat. Comput. 26(4), 783–796 (2016)
MathSciNet MATH Google Scholar
Perthame, E., Friguet, C., Causeur, D.: FADA: Variable Selection for Supervised Classification in High Dimension. R package version 1.3.3 (2018)
Reif, M., Shafait, F., Dengel, A.: Prediction of classifier training time including parameter optimization. Annu. Conf. Artif. Intell. KI2011, 260–271 (2011)
Google Scholar
Safo, S.E., Ahn, J.: General sparse multi-class linear discriminant analysis. Comput. Stat. Data Anal. 99, 81–90 (2016)
MathSciNet MATH Google Scholar
Shaffer, J.P.: Multiple hypothesis testing. Annu. Rev. Psychol. 46, 561–584 (1995)
Google Scholar
Shao, J., Wang, Y., Deng, X., Wang, S.: Sparse linear discriminant analysis with applications to high dimensional data. Ann. Stat. 39(2), 1241–1265 (2011)
MATH Google Scholar
Singh, D., Febbo, P., Ross, K., Jackson, D., Manola, J., Ladd, C., Tamayo, P., Renshaw, A., DAmico, A., Richie, J., Lander, E., Loda, M., Kantoff, P., Golub, T.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)
Google Scholar
Srivastava, S., Gupta, M.R., Frigyik, B.A.: Bayesian quadratic discriminant analysis. J. Mach. Learn. Res. 8(Jun), 1277–1305 (2007)
MathSciNet MATH Google Scholar
Storey, J.D.: The positive false discovery rate: a Bayesian interpretation and the q-value. Ann. Stat. 31(6), 2013–2035 (2003)
MathSciNet MATH Google Scholar
Teh, Y.W., Newman, D., Welling, M.: A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In: Jordan, M.I., LeCun, Y., Solla, S.A. (eds.) Advances in Neural Information Processing Systems, vol. 19, pp. 1353–1360. MIT Press, Cambridge (2007)
Google Scholar
Thomaz, C., Kitani, E., Gillies, D.: A maximum uncertainty lda-based approach for limited sample size problems—with applications to face recognition. J. Braz. Comput. Soc. 12(2), 7–18 (2006)
Google Scholar
Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Stat. Sci. 18(1), 104–117 (2003)
MathSciNet MATH Google Scholar
van der Maaten, L., Hinton, G.: Visualising data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2017)
MATH Google Scholar
Wang, Y., Blei, D.: Frequentist consistency of variational bayes. J. Am. Stat. Assoc. 9, 1–15 (2018). https://doi.org/10.1080/01621459.2018.1473776
Article MATH Google Scholar
Witten, D.: Classification and clustering of sequencing data using a Poisson model. Ann. Appl. Stat. 5(4), 2493–2518 (2011)
MathSciNet MATH Google Scholar
Witten, D.: penalizedLDA: Penalized Classification Using Fisher’s Linear Discriminant. R package version 1.1 (2015)
Witten, D., Tibshirani, R.: Penalized classification using Fisher’s linear discriminant. J. R. Stat. Soc. Ser. B 73(5), 754–772 (2011)
MathSciNet MATH Google Scholar
Xu, P., Brock, G.N., Parrish, R.S.: Modified linear discriminant analysis approaches for classification of high-dimensional microarray data. Comput. Stat. Data Anal. 53, 1674–1687 (2009)
MathSciNet MATH Google Scholar
Zavorka, S., Perrett, J.: Minimum sample size considerations for two-group linear and quadratic discriminant analysis with rare populations. Commun. Stat. Simul. Comput. 43(7), 1726–1739 (2014)
MathSciNet MATH Google Scholar
Zhang, A., Zhou, H.: Theoretical and computational guarantees of mean field variational inference for community detection. arXiv (2017)
Zhang, C., Liu, C., Zhang, X., Almpanidis, G.: An up-to-date comparison of state-of-the-art classification algorithms. Exp. Syst. Appl. 82, 128–150 (2017)
Google Scholar

Download references

Acknowledgements

The authors would like to thank Rachel Wang (University of Sydney), the associate editor, and the anonymous reviewers for their valuable feedback to greatly improve this manuscript.

Author information

Authors and Affiliations

School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, 2006, Australia
Weichang Yu, John T. Ormerod & Michael Stewart
ARC Centre of Excellence for Mathematical and Statistical Frontiers, University of Melbourne, Parkville, VIC, 3010, Australia
John T. Ormerod

Authors

Weichang Yu
View author publications
You can also search for this author in PubMed Google Scholar
John T. Ormerod
View author publications
You can also search for this author in PubMed Google Scholar
Michael Stewart
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weichang Yu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 287 KB)

Supplementary material 2 (pdf 178 KB)

Appendix

1.1 VQDA derivations

In the VQDA setting ($\sigma _{j 1}^2 \ne \sigma _{j 0}^2$) the posterior distribution of $\gamma _j$ given ${\mathcal {D}}$ and $\mathbf{x}_{n+1}$ may be expressed as

$$\begin{aligned} p(\gamma _j \; | \; {\mathcal {D}}, \mathbf{x}_{n+1}) = \frac{ p(\gamma _j, {\mathcal {D}}, \mathbf{x}_{n+1})}{ p({\mathcal {D}}, \mathbf{x}_{n+1}) }. \end{aligned}$$

By letting $h \rightarrow \infty $, the marginal likelihood of the data in the denominator is of the same form as Eq. (9) with the exception that

$$\begin{aligned} {\varvec{\theta }}_1 = ({\varvec{\mu }}_1, {\varvec{\mu }}_0, {\varvec{\mu }}, {\varvec{\sigma }}_1^2, {\varvec{\sigma }}_0^2, {\varvec{\sigma }}^2, \rho _y, \rho _\gamma ), \end{aligned}$$

and

$$\begin{aligned}&\lambda _{\text {LRT}} (\widetilde{\mathbf{x}}_j, x_{n+1,j}, \mathbf{y}, y_{n+1}) \\&\quad = (n+1) \log (\widehat{\sigma }_j^2) - (n_1 + y_{n+1}) \log (\widehat{\sigma }_{j1}^2) \\&\qquad - (n_0 + 1 - y_{n+1}) \log (\widehat{\sigma }_{j0}^2), \end{aligned}$$

where

$$\begin{aligned} \widehat{\sigma }_{j 1}^2 =&\tfrac{1}{n_1+y_{n+1}} \bigg [ ||\mathbf{y}^\mathrm{T}\{\widetilde{\mathbf{x}}_{j} - \widehat{\mu }_{j1}\mathbf{1}\}||^2 \\&\quad + y_{n+1} (x_{n+1,j} - \widehat{\mu }_{j1})^2 \bigg ],\\ \widehat{\sigma }_{j 0}^2 =&\tfrac{1}{n_0 +1 - y_{n+1}} \bigg [ ||(\mathbf{1}- \mathbf{y})^\mathrm{T}\{\widetilde{\mathbf{x}}_{j} - \widehat{\mu }_{j0}\mathbf{1}\}||^2 \\&\quad + (1 - y_{n+1}) (x_{n+1,j} - \widehat{\mu }_{j0})^2 \bigg ], \end{aligned}$$

and the $j{\text {th}}$ entry of ${\varvec{\lambda }}_{\text {Bayes}}$ is (as $h \rightarrow \infty $)

$$\begin{aligned}&\lambda _{\text {Bayes}} (\widetilde{\mathbf{x}}_j, x_{n+1,j}, \mathbf{y}, y_{n+1}) \\&\quad \rightarrow \lambda _{\text {LRT}}(\widetilde{\mathbf{x}}_j, x_{n+1,j}, \mathbf{y}, y_{n+1}) + \log (n_1 + y_{n+1} ) \\&\qquad + \log (n_0 + 1 - y_{n+1} ) - \log (2) - 3\log (n+1) \\&\qquad -\, 2\xi \{(n+1)/2\} + 2\xi \{(n_1 + y_{n+1})/2\} \\&\qquad +\, 2\xi \{ (n_0 + 1 - y_{n+1})/2 \}, \\&= \lambda _{\text {LRT}}(\widetilde{\mathbf{x}}_j, x_{n+1,j}, \mathbf{y}, y_{n+1}) - 2 \log (n+1) \\&\qquad +\,O(n_0^{-1} + n_1^{-1}), \end{aligned}$$

where $\xi (x) = \log \varGamma (x) + x - x \log (x) - \tfrac{1}{2} \log (2 \pi )$. Since the calculation of the marginal likelihood involves a combinatorial sum over $2^{p+1}$ binary combinations, exact Bayesian inference is also computationally impractical in the VQDA setting.

Table 2 Iterative scheme for obtaining the parameters in the optimal densities $q(\varvec{\gamma }, y_{n+1}, \ldots , y_{n+m})$ in VQDA

Full size table

Similar to VLDA, we will use RCVB to approximate the posterior $p({\varvec{\gamma }}, y_{n+1} | \mathbf{x}, \mathbf{x}_{n+1}, \mathbf{y})$ by

$$\begin{aligned} q(y_{n+1}, {\varvec{\gamma }}) = q (y_{n+1}) \prod _{j=1}^{p} q_j (\gamma _j). \end{aligned}$$

This yields the approximate posterior for $\gamma _j$ as

$$\begin{aligned}&q_j(\gamma _j) \propto \int \exp \big [ {\mathbb {E}}_{-q_j} \{ \log p({\mathcal {D}}, \mathbf{x}_{n+1}, y_{n+1},{\varvec{\gamma }}, {\varvec{\theta }}_1) \} \big ] \text {d} {\varvec{\theta }}_1, \\&\quad \propto \exp \bigg [ {\mathbb {E}}_{-q_j} \Big \{ \log {\mathcal {B}}(a_\gamma + \mathbf{1}^\mathrm{T}{\varvec{\gamma }}, b_\gamma + p - \mathbf{1}^\mathrm{T}{\varvec{\gamma }}) \Big \} \\&\qquad + \tfrac{\gamma _j}{2} {\mathbb {E}}_{-q_j} \Big \{ {\varvec{\lambda }}_{\text {Bayes}} (\widetilde{\mathbf{x}}_j, x_{n+1,j}, \mathbf{y}, y_{n+1}) \Big \} \bigg ]. \end{aligned}$$

For a sufficiently large n, we can avoid the need to evaluate the expectation $ {\mathbb {E}}_{-q_j} \Big \{ {\varvec{\lambda }}_{\text {Bayes}} (\widetilde{\mathbf{x}}_j, x_{n+1,j}, \mathbf{y}, y_{n+1}) \Big \}$ by applying Taylor’s expansion to obtain the approximation

$$\begin{aligned}&{\mathbb {E}}_{-q_j} \log (a_\gamma + \mathbf{1}^\mathrm{T} {\varvec{\gamma }}_{-j}) \approx \log (a_\gamma + \mathbf{1}^\mathrm{T} \mathbf{w}_{-j}), \nonumber \\&{\mathbb {E}}_{-q_j} \log (b_\gamma + p - \mathbf{1}^\mathrm{T} {\varvec{\gamma }}_{-j} - 1) \nonumber \\&\quad \approx \log (b_\gamma + p - \mathbf{1}^\mathrm{T} \mathbf{w}_{-j} - 1), \nonumber \\&\widehat{\sigma }_{j 1}^2 \approx \tfrac{1}{n_1} ||\mathbf{y}^\mathrm{T}\{\widetilde{\mathbf{x}}_{j} - \widehat{\mu }_{j1}\mathbf{1}\}||^2, \nonumber \\&\widehat{\sigma }_{j 0}^2 \approx \tfrac{1}{n_0} ||(\mathbf{1}- \mathbf{y})^\mathrm{T}\{\widetilde{\mathbf{x}}_{j} - \widehat{\mu }_{j0}\mathbf{1}\}||^2, \nonumber \\&\widehat{\sigma }_{j}^2 \approx \tfrac{1}{n} ||\widetilde{\mathbf{x}}_{j} - \widehat{\mu }_{j}\mathbf{1}||^2, \nonumber \\&\widehat{\mu }_{j 1} \approx \tfrac{1}{n_1} \mathbf{y}^\mathrm{T} \widetilde{\mathbf{x}}_j , \;\; \widehat{\mu }_{j 0} \approx \tfrac{1}{n_0} (\mathbf{1}- \mathbf{y})^\mathrm{T} \widetilde{\mathbf{x}}_j, \;\; \widehat{\mu }_{j} \approx \tfrac{1}{n} \mathbf{1}^\mathrm{T} \widetilde{\mathbf{x}}_j, \end{aligned}$$

(16)

and, similar to VLDA, ${\varvec{\lambda }}_{\text {Bayes}}$ does not depend on the new observation $(\mathbf{x}_{n+1}, y_{n+1})$. By using the approximation in (16), we have

$$\begin{aligned} w_j&= \frac{q_j(\gamma _j = 1)}{q_j(\gamma _j = 1) + q_j(\gamma _j = 0)}, \\&\approx \text {expit}\bigg [ \log (a_\gamma {+} \mathbf{1}^\mathrm{T} \mathbf{w}_{-j}) {-} \log (b_\gamma + p - \mathbf{1}^\mathrm{T} \mathbf{w}_{-j} - 1) \\&\quad + \tfrac{1}{2} \log (\tfrac{n_1 n_0}{2}) + \xi (\tfrac{n_1}{2}) + \xi (\tfrac{n_0}{2}) - \xi (\tfrac{n}{2}) \\&\quad - \tfrac{3}{2} \log (n+1) + \tfrac{1}{2} \lambda _{\text {LRT}} (\widetilde{\mathbf{x}}_j, y_{n+1}) \bigg ], \\&= \text {expit}\bigg [ \text {penalty}_{QDA,j} + \tfrac{1}{2} \lambda _{\text {LRT}} (\widetilde{\mathbf{x}}_j, y_{n+1}) \bigg ], \end{aligned}$$

To obtain the approximate density for $y_{n+1}$, we integrate analytically over ${\varvec{\theta }}_1$ to obtain

$$\begin{aligned} q(y_{n+1})&\propto \int \exp \big [ {\mathbb {E}}_{-y} \{ \log p({\mathcal {D}}, \mathbf{x}_{n+1}, y_{n+1},{\varvec{\gamma }}, {\varvec{\theta }}_1) \} \big ] \text {d} {\varvec{\theta }}_1, \\&\propto \exp \bigg [ \log {\mathcal {B}}(a_y + n_1 + y_{n+1}, b_y + n_0 + 1 - y_{n+1}) \\&\quad + \mathbf{1}^\mathrm{T} \mathbf{w}\Big \{ \log \varGamma (\tfrac{n_1 + y_{n+1}}{2}) + \log \varGamma (\tfrac{n_0 + 1 - y_{n+1}}{2}) \Big \} \\&\quad + \tfrac{1}{2} \mathbf{w}^\mathrm{T} \Big \{ \log {\varvec{\phi }}(\mathbf{x}_{n+1}; \widehat{{\varvec{\mu }}}_1, \widehat{{\varvec{\sigma }}}_{1}^2) - \log {\varvec{\phi }}(\mathbf{x}_{n+1}; \widehat{{\varvec{\mu }}}_0, \widehat{{\varvec{\sigma }}}_{0}^2) \Big \} \bigg ], \end{aligned}$$

where the $j{\text {th}}$ element of the $p \times 1$ vector ${\varvec{\phi }}(\mathbf{x}_{n+1}; \widehat{{\varvec{\mu }}}_k, \widehat{{\varvec{\sigma }}}_{k}^2)$ is the Gaussian density

$$\begin{aligned} \phi (x_{n+1,j}; \widehat{\mu }_{j k}, \widehat{\sigma }_{j k}^2), \end{aligned}$$

and the $\log $ prefix denotes an element-wise $\log $ of a vector.

In the general case with m new observations, we may apply Taylor’s expansion results from (16) to compute the approximate classification probability for $y_{n+i}$ as

$$\begin{aligned} \widetilde{y}_i&= \frac{q(y_{n+i} = 1)}{q(y_{n+i} = 1) + q(y_{n+i} = 0)}, \\&\approx \text {expit}\bigg [ \log \big ( \tfrac{n_1}{n_0} \big ) + \mathbf{1}^\mathrm{T} \mathbf{w}\big \{ \log \varGamma (\tfrac{n_1 + 1}{2}) - \log \varGamma (\tfrac{n_1}{2}) \\&\quad + \log \varGamma (\tfrac{n_0 + 1}{2}) - \log \varGamma (\tfrac{n_0}{2}) \big \} \\&\quad + \tfrac{1}{2} \mathbf{w}^\mathrm{T} \Big \{ \log {\varvec{\phi }}(\mathbf{x}_{n+i}; \widehat{{\varvec{\mu }}}_1, \widehat{{\varvec{\sigma }}}_{1}^2) - \log {\varvec{\phi }}(\mathbf{x}_{n+i}; \widehat{{\varvec{\mu }}}_0, \widehat{{\varvec{\sigma }}}_{0}^2) \Big \} \bigg ]. \end{aligned}$$

The RCVB algorithm for VQDA may be found in Table 2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, W., Ormerod, J.T. & Stewart, M. Variational discriminant analysis with variable selection. Stat Comput 30, 933–951 (2020). https://doi.org/10.1007/s11222-020-09928-8

Download citation

Received: 19 October 2018
Accepted: 06 February 2020
Published: 19 February 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s11222-020-09928-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variational discriminant analysis with variable selection

Abstract

Access this article

Similar content being viewed by others

Matrix-Variate Discriminative Analysis, Integrative Hypothesis Testing, and Geno-Pheno A5 Analyzer

Variable selection in discriminant analysis for mixed continuous-binary variables and several groups

Discriminant analysis with Gaussian graphical tree models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 287 KB)

Supplementary material 2 (pdf 178 KB)

Appendix

1.1 VQDA derivations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Variational discriminant analysis with variable selection

Abstract

Access this article

Similar content being viewed by others

Matrix-Variate Discriminative Analysis, Integrative Hypothesis Testing, and Geno-Pheno A5 Analyzer

Variable selection in discriminant analysis for mixed continuous-binary variables and several groups

Discriminant analysis with Gaussian graphical tree models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 287 KB)

Supplementary material 2 (pdf 178 KB)

Appendix

Appendix

1.1 VQDA derivations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation