Constructing a knowledge-based heterogeneous information graph for medical health status classification

Pham, Thuan; Tao, Xiaohui; Zhang, Ji; Yong, Jianming

doi:10.1007/s13755-020-0100-6

Constructing a knowledge-based heterogeneous information graph for medical health status classification

Research
Published: 14 February 2020

Volume 8, article number 10, (2020)
Cite this article

Health Information Science and Systems Aims and scope Submit manuscript

Thuan Pham ORCID: orcid.org/0000-0001-7433-858X¹,
Xiaohui Tao¹,
Ji Zhang¹ &
…
Jianming Yong¹

931 Accesses
17 Citations
Explore all metrics

Abstract

Applying Pearson correlation and semantic relations in building a heterogeneous information graph (HIG) to develop a classification model has achieved a notable performance in improving the accuracy of predicting the status of health risks. In this study, the approach that was used, integrated knowledge of the medical domain as well as taking advantage of applying Pearson correlation and semantic relations in building a classification model for diagnosis. The research mined knowledge which was extracted from titles and abstracts of MEDLINE to discover how to assess the links between objects relating to medical concepts. A knowledge-base HIG model then was developed for the prediction of a patient’s health status. The results of the experiment showed that the knowledge-base model was superior to the baseline model and has demonstrated that the knowledge-base could help improve the performance of the classification model. The contribution of this study has been to provide a framework for applying a knowledge-base in the classification model which helps these models achieve the best performance of predictions. This study has also contributed a model to medical practice to help practitioners become more confident in making final decisions in diagnosing illness. Moreover, this study affirmed that biomedical literature could assist in building a classification model. This contribution will be advantageous for future researchers in mining the knowledge-base to develop different kinds of classification models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining health knowledge graph for health risk prediction

Article 20 March 2020

MeKG: Building a Medical Knowledge Graph by Data Mining from MEDLINE

KGHC: a knowledge graph for hepatocellular carcinoma

Article Open access 09 July 2020

Notes

References

Anupindi TR, Srinivasan P. Disease comorbidity linkages between medline and patient data. In: 2017 IEEE International Conference on Healthcare Informatics (ICHI) IEEE; 2017. pp. 403–408.
Banuqitah H, Eassa F, Jambi K, Abulkhair M. Two level self-supervised relation extraction from medline using umls. Int J Data Min Knowl Manag Process IJDKP. 2016;6(3):11–23.
Article Google Scholar
Biswas RK, Kabir E. Influence of distance between residence and health facilities on non-communicable diseases: an assessment over hypertension and diabetes in bangladesh. PLoS ONE. 2017;12(5):e0177027.
Article Google Scholar
Böckmann B, Heiden K. Extracting and transforming clinical guidelines into pathway models for different hospital information systems. Health Inf Sci Syst. 2013;1(1):13.
Article Google Scholar
Bowes D, Hall T, Gray D. Comparing the performance of fault prediction models which report multiple performance measures: recomputing the confusion matrix. In: Proceedings of the 8th International Conference on Predictive Models in Software Engineering. ACM; 2012. pp. 109–118.
Boytcheva S, Angelova G, Angelov Z, Tcharaktchiev D. Mining comorbidity patterns using retrospective analysis of big collection of outpatient records. Health Inf Sci Syst. 2017;5(1):3.
Article Google Scholar
Cases M, Furlong LI, Albanell J, Altman RB, Bellazzi R, Boyer S, Brand A, Brookes AJ, Brunak S, Clark TW, et al. Improving data and knowledge management to better integrate health care and research. J Intern Med. 2013;274(4):321–8.
Article Google Scholar
Chen ES, Hripcsak G, Xu H, Markatou M, Friedman C. Automated acquisition of disease-drug knowledge from biomedical and clinical documents: an initial study. J Am Med Inf Assoc. 2008;15(1):87–98.
Article Google Scholar
Chen L, Li X, Sheng QZ, Peng WC, Bennett J, Hu HY, Huang N. Mining health examination records—a graph-based approach. IEEE Trans Knowl Data Eng. 2016;28(9):2423–37.
Article Google Scholar
Costa JP, Stopar L, Fuart F, Grobelnik M, Santanam R, Sun C, Carlin P, Black M, Wallace J. Mining medline for the visualisation of a global perspective on biomedical knowledge. In: KDD 2018 (24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining); 2018.
Escudié JB, Rance B, Malamut G, Khater S, Burgun A, Cellier C, Jannot AS. A novel data-driven workflow combining literature and electronic health records to estimate comorbidities burden for a specific disease: a case study on autoimmune comorbidities in patients with celiac disease. BMC Med Inf Decis Mak. 2017;17(1):140.
Article Google Scholar
Goh WP, Tao X, Zhang J, Yong J. Decision support systems for adoption in dental clinics: a survey. Knowl Based Syst. 2016;104:195–206.
Article Google Scholar
Hanauer DA, Saeed M, Zheng K, Mei Q, Shedden K, Aronson AR, Ramakrishnan N. Applying metamap to medline for identifying novel associations in a large clinical dataset: a feasibility analysis. J Am Med Inf Assoc. 2014;21(5):925–37.
Article Google Scholar
Hidalgo CA, Blumm N, Barabási AL, Christakis NA. A dynamic network approach for the study of human phenotypes. PLoS Comput Biol. 2009;5(4):e1000353.
Article Google Scholar
Huang Z, Yang J, van Harmelen F, Hu Q. Constructing knowledge graphs of depression. In: International Conference on Health Information Science. Springer; 2017. pp. 149–161.
Ji M, Han J, Danilevsky M. Ranking-based classification of heterogeneous information networks. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM; 2011. pp. 1298–1306.
Jiang Y, Qiu B, Xu C, Li C. The research of clinical decision support system based on three-layer knowledge base model. J Healthcare Eng. (2017).
Kavuluru R, Han S, Harris D. Unsupervised extraction of diagnosis codes from emrs using knowledge-based and extractive text summarization techniques. In: Canadian conference on artificial intelligence. Springer; 2013. pp. 77–88.
Lei X, Zhang Y. Predicting disease-genes based on network information loss and protein complexes in heterogeneous network. Inf Sci. 2019;479:386–400.
Article Google Scholar
Liu YI, Wise PH, Butte AJ. The “etiome”: identification and clustering of human disease etiological factors. In: BMC bioinformatics. vol. 10, p. S14. BioMed Central; 2009.
Luo C, Guan R, Wang Z, Lin C. Hetpathmine: A novel transductive classification algorithm on heterogeneous information networks. In: European Conference on Information Retrieval. Springer; 2014. pp. 210–221.
Luo G. Automatically explaining machine learning prediction results: a demonstration on type 2 diabetes risk prediction. Health Inf Sci Syst. 2016;4(1):2.
Article Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems; 2013. pp. 3111–3119.
Pereira S, Névéol A, Massari P, Joubert M, Darmoni S. Construction of a semi-automated icd-10 coding help system to optimize medical and economic coding. In: MIEl; 2006. pp. 845–850.
Perotte A, Ranganath R, Hirsch JS, Blei D, Elhadad N. Risk prediction for chronic kidney disease progression using heterogeneous electronic health record data and time series analysis. J Am Med Inf Assoc. 2015;22(4):872–80.
Article Google Scholar
Pham T, Tao X, Zhanag J, Yong J, Zhang W, Cai Y. Mining heterogeneous information graph for health status classification. In: 2018 5th International Conference on Behavioral, Economic, and Socio-Cultural Computing (BESC). IEEE; 2018. pp. 73–78.
Saitwal H, Qing D, Jones S, Bernstam EV, Chute CG, Johnson TR. Cross-terminology mapping challenges: a demonstration using medication terminological systems. J Biomed Inform. 2012;45(4):613–25.
Article Google Scholar
Schriml LM, Mitraka E, Munro J, Tauber B, Schor M, Nickle L, Felix V, Jeng L, Bearer C, Lichenstein R, et al. Human disease ontology 2018 update: classification, content and workflow expansion. Nucleic Acids Res. 2018;47(D1):D955–62.
Article Google Scholar
Shah S, Luo X, Kanakasabai S, Tuason R, Klopper G. Neural networks for mining the associations between diseases and symptoms in clinical notes. Health Inf Sci Syst. 2019;7(1):1.
Article Google Scholar
Shakeel PM, Baskar S, Dhulipala VS, Jaber MM. Cloud based framework for diagnosis of diabetes mellitus using k-means clustering. Health Inf Sci Syst. 2018;6(1):16.
Article Google Scholar
Soualmia LF, Sakji S, Letord C, Rollin L, Massari P, Darmoni SJ. Improving information retrieval with multiple health terminologies in a quality-controlled gateway. Health Inf Sci Syst. 2013;1(1):8.
Article Google Scholar
Srinivasan S, Rindflesch TC, Hole WT, Aronson AR, Mork JG. Finding umls metathesaurus concepts in medline. In: Proceedings of the AMIA Symposium. p. 727. American Medical Informatics Association; 2002.
Sun Y, Han J. Mining heterogeneous information networks: a structural analysis approach. Acm Sigkdd Explorations Newsl. 2013;14(2):20–8.
Article Google Scholar
Supriya S, Siuly S, Wang H, Cao J, Zhang Y. Weighted visibility graph with complex network features in the detection of epilepsy. IEEE Access. 2016;4:6554–66.
Article Google Scholar
Tateisi Y. Resources for assigning mesh IDs to Japanese medical terms. Genomics Inform. 2019;17(2):e16.
Article Google Scholar
Wang H, Zhang Q, Yuan J. Semantically enhanced medical information retrieval system: a tensor factorization based approach. IEEE Access. 2017;5:7584–93.
Article Google Scholar
Wang L, Del Fiol G, Bray BE, Haug PJ. Generating disease-pertinent treatment vocabularies from medline citations. J Biomed Inform. 2017;65:46–57.
Article Google Scholar
Xiong Y, Ruan L, Guo M, Tang C, Kong X, Zhu Y, Wang W. Predicting disease-related associations by heterogeneous network embedding. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE; 2018. pp. 548–555.
Xu R, Li L, Wang Q. driskkb: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text. BMC Bioinform. 2014;15(1):105.
Article MathSciNet Google Scholar
Xu R, Wang Q. Large-scale extraction of accurate drug-disease treatment pairs from biomedical literature for drug repurposing. BMC Bioinform. 2013;14(1):181.
Article Google Scholar
Xu R, Wang Q. Toward creation of a cancer drug toxicity knowledge base: automatically extracting cancer drug-side effect relationships from the literature. J Am Med Inf Assoc. 2013;21(1):90–6.
Article Google Scholar
Zeng Q, Cimino JJ. Automated knowledge extraction from the umls. In: Proceedings of the AMIA Symposium. p. 568. American Medical Informatics Association; 1998.
Zhang Y, Srimani PK, Wang JZ. Combining mesh thesaurus with umls in pseudo relevance feedback to improve biomedical information retrieval. In: 2016 IEEE International Conference on Knowledge Engineering and Applications (ICKEA). IEEE; 2016. pp. 67–71.
Zhao D, Weng C. Combining pubmed knowledge and ehr data to develop a weighted bayesian network for pancreatic cancer prediction. J Biomed Inform. 2011;44(5):859–68.
Article Google Scholar
Zheng G, Callan J. Learning to reweight terms with distributed representations. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM; 2015. pp. 575–584.

Download references

Acknowledgements

The work is conducted with approval from the Human Research Ethics Committee of the University of Southern Queensland, Australia (Approval ID: H18REA049). The authors acknowledge the use of the National Health and Nutrition Examination Survey (NHANES) and National Ambulatory Medical Care Survey (NAMCS) in the study and especially, thank the Centers for Disease Control and Prevention of the Department of Health and Human Services, the United States for making the data set publicly available for research purpose. The authors also appreciate the courtesy of the U.S. National Library of Medicine for allowing the use of MEDLINE.

Author information

Authors and Affiliations

University of Southern Queensland, Toowoomba, Australia
Thuan Pham, Xiaohui Tao, Ji Zhang & Jianming Yong

Authors

Thuan Pham
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohui Tao
View author publications
You can also search for this author in PubMed Google Scholar
Ji Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jianming Yong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thuan Pham.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pham, T., Tao, X., Zhang, J. et al. Constructing a knowledge-based heterogeneous information graph for medical health status classification. Health Inf Sci Syst 8, 10 (2020). https://doi.org/10.1007/s13755-020-0100-6

Download citation

Received: 21 July 2019
Accepted: 23 January 2020
Published: 14 February 2020
DOI: https://doi.org/10.1007/s13755-020-0100-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constructing a knowledge-based heterogeneous information graph for medical health status classification

Abstract

Access this article

Similar content being viewed by others

Mining health knowledge graph for health risk prediction

MeKG: Building a Medical Knowledge Graph by Data Mining from MEDLINE

KGHC: a knowledge graph for hepatocellular carcinoma

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Constructing a knowledge-based heterogeneous information graph for medical health status classification

Abstract

Access this article

Similar content being viewed by others

Mining health knowledge graph for health risk prediction

MeKG: Building a Medical Knowledge Graph by Data Mining from MEDLINE

KGHC: a knowledge graph for hepatocellular carcinoma

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation