MCA-Based Rule Mining Enables Interpretable Inference in Clinical Psychiatry

Gao, Qingzhu; Gonzalez, Humberto; Ahammad, Parvez

doi:10.1007/978-3-030-24409-5_3

Part of the book series: Studies in Computational Intelligence ((SCI,volume 843))

Included in the following conference series:

International Workshop on Health Intelligence

826 Accesses
3 Citations
2 Altmetric

Abstract

Development of interpretable machine learning models for clinical healthcare applications has the potential of changing the way we understand, treat, and ultimately cure, diseases and disorders in many areas of medicine. These models can serve not only as sources of predictions and estimates, but also as discovery tools for clinicians and researchers to reveal new knowledge from the data. High dimensionality of patient information (e.g., phenotype, genotype, and medical history), lack of objective measurements, and the heterogeneity in patient populations often create significant challenges in developing interpretable machine learning models for clinical psychiatry in practice. In this paper we take a step towards the development of such interpretable models. First, by developing a novel categorical rule mining method based on Multivariate Correspondence Analysis (MCA) capable of handling datasets with large numbers of features, and second, by applying this method to build transdiagnostic Bayesian Rule List models to screen for psychiatric disorders using the Consortium for Neuropsychiatric Phenomics dataset. We show that our method is not only at least 100 times faster than state-of-the-art rule mining techniques for datasets with 50 features, but also provides interpretability and comparable prediction accuracy across several benchmark datasets.

Qingzhu Gao, Humberto Gonzalez contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499 (1994)
Google Scholar
Anthimopoulos, M., Christodoulidis, S., Ebner, L., Christe, A., Mougiakakou, S.: Lung pattern classification for interstitial lung diseases using a deep convolutional neural network. IEEE Trans. Med. Imaging 35(5), 1207–1216 (2016)
Article Google Scholar
Beam, A.L., Kohane, I.S.: Big data and machine learning in health care. JAMA 319(13), 1317–1318 (2018)
Article Google Scholar
Borgelt, C.: Frequent item set mining. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 2(6), 437–456 (2012)
Google Scholar
Brooks, S.P., Gelman, A.: General methods for monitoring convergence of iterative simulations. J. Comput. Graph. Stat. 7(4), 434–455 (1998)
MathSciNet Google Scholar
Campolo, A., Sanfilippo, M., Whittaker, M., Crawford, K.: AI Now 2017 report. AI Now Institute at New York University (2017)
Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960)
Article Google Scholar
Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach. Learn. 40(2), 139–157 (2000)
Article Google Scholar
Gelman, A., Rubin, D.B.: Inference from iterative simulation using multiple sequences. Stat. Sci. 7(4), 457–472 (1992)
Article Google Scholar
Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an approach to evaluating interpretability of machine learning (2018)
Google Scholar
Greenacre, M.J., Blasius, J.: Multiple Correspondence Analysis and Related Methods. Chapman & Hall/CRC, Boca Raton (2006)
Google Scholar
Gunning, D.: DARPA explainable artificial intelligence (XAI) (2017). https://www.darpa.mil/program/explainable-artificial-intelligence
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Rec. 29(2), 1–12 (2000)
Article Google Scholar
Hendricks, P.: Titanic: titanic passenger survival data set (2015). https://github.com/paulhendricks/titanic (R package version 0.1.0)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)
Article Google Scholar
Letham, B., Rudin, C., McCormick, T.H., Madigan, D.: Interpretable classifiers using rules and Bayesian analysis: building a better stroke prediction model. Ann. Appl. Stat. 9(3), 1350–1371 (2015)
Article MathSciNet Google Scholar
Li, W., Han, J., Pei, J.: CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 369–376 (2001)
Google Scholar
Lipton, Z.C.: The mythos of model interpretability. ACM Queue 16(3) (2018)
Google Scholar
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 80–86 (1998)
Google Scholar
Loève, M.: Probability Theory I. Springer, Berlin (1977)
MATH Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Poldrack, R.A., Congdon, E., Triplett, W., Gorgolewski, K.J., Karlsgodt, K.H., Mumford, J.A., Sabb, F.W., Freimer, N.B., London, E.D., Cannon, T.D., Bilder, R.M.: A phenome-wide examination of neural and cognitive function. Sci. Data 3, 160110 (2016)
Article Google Scholar
Rudin, C., Letham, B., Madigan, D.: Learning theory analysis for association rules and sequential event prediction. J. Mach. Learn. Res. 14, 3441–3492 (2013)
MathSciNet MATH Google Scholar
Valdes, G., Luna, J.M., Eaton, E., II, C., Ungar, L.H., Solberg, T.D.: MediBoost: a patient stratification tool for interpretable decision making in the era of precision medicine. Sci. Rep. 6, 37854 (2016)
Google Scholar
Wyatt, J., Spiegelhalter, D.: Field trials of medical decision-aids: potential problems and solutions. In: Proceedings of the Annual Symposium on Computer Application in Medical Care, pp. 3–7 (1991)
Google Scholar
Yin, X., Han, J.: CPAR: classification based on predictive association rules. In: Proceedings of the 2003 SIAM International Conference on Data Mining, pp. 331–335 (2003)
Google Scholar
Zhu, Q., Lin, L., Shyu, M.L., Chen, S.C.: Feature selection using correlation and reliability based scoring metric for video semantic detection. In: Proceedings of the IEEE 4th International Conference on Semantic Computing, pp. 462–469 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

BlackThorn Therapeutics, San Francisco, CA, 94103, USA
Qingzhu Gao, Humberto Gonzalez & Parvez Ahammad

Authors

Qingzhu Gao
View author publications
You can also search for this author in PubMed Google Scholar
Humberto Gonzalez
View author publications
You can also search for this author in PubMed Google Scholar
Parvez Ahammad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Humberto Gonzalez .

Editor information

Editors and Affiliations

Department of Pediatrics, The University of Tennessee Health Science Center – Oak-Ridge National Lab (UTHSC-ORNL) Center for Biomedical Informatics, Memphis, TN, USA
Arash Shaban-Nejad
School of Nursing, University of Minnesota, Minneapolis, MN, USA
Martin Michalowski

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gao, Q., Gonzalez, H., Ahammad, P. (2020). MCA-Based Rule Mining Enables Interpretable Inference in Clinical Psychiatry. In: Shaban-Nejad, A., Michalowski, M. (eds) Precision Health and Medicine. W3PHAI 2019. Studies in Computational Intelligence, vol 843. Springer, Cham. https://doi.org/10.1007/978-3-030-24409-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-24409-5_3
Published: 02 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24408-8
Online ISBN: 978-3-030-24409-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics