Abstract
Complex, distributed, and dynamic sets of clinical biomedical data are collectively referred to as multimodal clinical data. In order to accommodate the volume and heterogeneity of such diverse data types and aid in their interpretation when they are combined with a multi-scale predictive model, machine learning is a useful tool that can be wielded to deconstruct biological complexity and extract relevant outputs. Additionally, genome-scale metabolic models (GSMMs) are one of the main frameworks striving to bridge the gap between genotype and phenotype by incorporating prior biological knowledge into mechanistic models. Consequently, the utilization of GSMMs as a foundation for the integration of multi-omic data originating from different domains is a valuable pursuit towards refining predictions. In this chapter, we show how cancer multi-omic data can be analyzed via multimodal machine learning and metabolic modeling. Firstly, we focus on the merits of adopting an integrative systems biology led approach to biomedical data mining. Following this, we propose how constraint-based metabolic models can provide a stable yet adaptable foundation for the integration of multimodal data with machine learning. Finally, we provide a step-by-step tutorial for the combination of machine learning and GSMMs, which includes: (i) tissue-specific constraint-based modeling; (ii) survival analysis using time-to-event prediction for cancer; and (iii) classification and regression approaches for multimodal machine learning. The code associated with the tutorial can be found at https://github.com/Angione-Lab/Tutorials_Combining_ML_and_GSMM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Shi Y, Kim S (2014) Towards information analysis for big data. In: 2014 7th conference on Control and automation (CA). IEEE, Piscataway, pp 3–5
Gupta A (2015) Big data analysis using computational intelligence and Hadoop: a study. In: 2015 2nd international conference on computing for sustainable global development (INDIACom). IEEE, Piscataway, pp 1397–1401
Ceri S, Kaitoua A, Masseroli M, Pinoli P, Venco F (2016) Data management for heterogeneous genomic datasets. IEEE/ACM Trans Comput Biol Bioinform 14(6):1251–1264
Kench A, Janeja VP, Yesha Y, Rishe N, Grasso MA, Niskar A (2015) Clinico-genomic data analytics for precision diagnosis and disease management. In: 2015 international conference on healthcare informatics (ICHI). IEEE, Piscataway, pp 263–271
Zieba A, Grannas K, Söderberg O, Gullberg M, Nilsson M, Landegren U (2012) Molecular tools for companion diagnostics. New Biotechnol 29(6):634–640
Ascolani G, Occhipinti A, Liò P (2015) Modelling circulating tumour cells for personalised survival prediction in metastatic breast cancer. PLoS Comput Biol 11(5):e1004
Rieger PT (2004) The biology of cancer genetics. In: Seminars in oncology nursing, vol 20. Elsevier, Amsterdam, pp 145–154
Moorcraft SY, Gonzalez D, Walker BA (2015) Understanding next generation sequencing in oncology: a guide for oncologists. Crit Rev Oncol/Hematol 96(3):463–474
Bertram JS (2000) The molecular biology of cancer. Mol Aspects Med 21(6):167–223
Schatz MC, Langmead B (2013) The DNA data deluge. IEEE Spectr 50(7):28–33
Eyassu F, Angione C (2017) Modelling pyruvate dehydrogenase under hypoxia and its role in cancer metabolism. R Soc Open Sci 4(10):170
Pavlova NN, Thompson CB (2016) The emerging hallmarks of cancer metabolism. Cell Metab 23(1):27–47
Pacheco MP, Bintener T, Sauter T (2019) Towards the network-based prediction of repurposed drugs using patient-specific metabolic models. EBioMedicine 43:26–27
Martin SD, McGee SL (2019) A systematic flux analysis approach to identify metabolic vulnerabilities in human breast cancer cell lines. Cancer Metab 7(1):12
Edwards LM (2017) Metabolic systems biology: a brief primer. J Physiol 595(9):2849–2855
Palsson B (2015) Systems biology. Cambridge University Press, Cambridge
Angione C (2019) Human systems biology and metabolic modelling: a review—from disease metabolism to precision medicine. BioMed Res Int 2019:8304260
Ryu JY, Kim HU, Lee SY (2017) Framework and resource for more than 11,000 gene-transcript-protein-reaction associations in human metabolism. Proc Nat Acad Sci 114(45):E9740–E9749
Angione C (2018) Integrating splice-isoform expression into genome-scale models characterizes breast cancer metabolism. Bioinformatics 34(3):494–501
Montanari P, Bartolini I, Ciaccia P, Patella M, Ceri S, Masseroli M (2016) Pattern similarity search in genomic sequences. IEEE Trans Knowl Data Eng 28(11):3053–3067
Wang Xl, Li Jy, Liu Y, Wang Yf, Zhao Ds (2013) Building localized bioinformatics platform based on galaxy and high performance computing cluster. In: 2013 6th International Conference on Biomedical engineering and informatics (BMEI). IEEE, Piscataway, pp 712–716
Belgrave D, Henderson J, Simpson A, Buchan I, Bishop C, Custovic A (2017) Disaggregating asthma: big investigation versus big data. J Allergy Clin Immunol 139(2):400–407
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam
Kitchin R (2014) The data revolution: big data, open data, data infrastructures and their consequences. SAGE Publishing, Thousand Oaks
Cairns RA, Harris IS, Mak TW (2011) Regulation of cancer cell metabolism. Nat Rev Cancer 11(2):85
Mardinoglu A, Nielsen J (2016) The impact of systems medicine on human health and disease. Fron Physiol 7:552
Barrett CL, Kim TY, Kim HU, Palsson BØ, Lee SY (2006) Systems biology as a foundation for genome-scale synthetic biology. Curr Opin Biotechnol 17(5):488–492
Yurkovich JT, Palsson BO (2015) Solving puzzles with missing pieces: the power of systems biology. Proc IEEE 104(1):2–7
Palsson BØ (2011) Systems biology: simulation of dynamic network states. Cambridge University Press, Cambridge
Gomez-Cabrero D, Abugessaisa I, Maier D, Teschendorff A, Merkenschlager M, Gisel A, Ballestar E, Bongcam-Rudloff E, Conesa A, Tegnér J (2014) Data integration in the era of omics: current and future challenges. BMC Syst Biol 8(Suppl 2):I1
Ivanov O, van der Schaft A, Weissing FJ (2016) Steady states and stability in metabolic networks without regulation. J Theor Biol 401:78–93
Nielsen J (2017) Systems biology of metabolism: a driver for developing personalized and precision medicine. Cell Metab 25(3):572–579
Joyce AR, Palsson BØ (2006) The model organism as a system: integrating ’omics’ data sets. Nat Rev Mol Cell Biol 7(3):198
Aurich MK, Fleming RM, Thiele I (2016) Metabotools: a comprehensive toolbox for analysis of genome-scale metabolic models. Front Physiol 7:327
Bordbar A, Palsson BO (2012) Using the reconstructed genome-scale human metabolic network to study physiology and pathology. J Internal Med 271(2):131–141
Orth JD, Thiele I, Palsson BØ (2010) What is flux balance analysis? Nat Biotechnol 28(3):245
O’Brien EJ, Monk JM, Palsson BO (2015) Using genome-scale models to predict biological capabilities. Cell 161(5):971–987
Di Filippo M, Colombo R, Damiani C, Pescini D, Gaglio D, Vanoni M, Alberghina L, Mauri G (2016) Zooming-in on cancer metabolic rewiring with tissue specific constraint-based models. Comput Biol Chem 62:60–69
Vivek-Ananth R, Samal A (2016) Advances in the integration of transcriptional regulatory information into genome-scale metabolic models. Biosystems 147:1–10
Yilmaz LS, Walhout AJ (2017) Metabolic network modeling with model organisms. Curr Opin Chem Biol 36:32–39
Fernandes S, Robitaille J, Bastin G, Jolicoeur M, Wouwer AV (2016) Dynamic metabolic flux analysis of underdetermined and overdetermined metabolic networks. IFAC-PapersOnLine 49(26):318–323
Rügen M, Bockmayr A, Steuer R (2015) Elucidating temporal resource allocation and diurnal dynamics in phototrophic metabolism using conditional FBA. Sci Rep 5:15,247
Lularevic M, Racher AJ, Jaques C, Kiparissides A (2019) Improving the accuracy of flux balance analysis through the implementation of carbon availability constraints for intracellular reactions. Biotechnol Bioeng 116(9):2339–2352
Ataman M, Hatzimanikatis V (2015) Heading in the right direction: thermodynamics-based network analysis and pathway engineering. Curr Opin Biotechnol 36:176–182
Willemsen AM, Hendrickx DM, Hoefsloot HC, Hendriks MM, Wahl SA, Teusink B, Smilde AK, van Kampen AH (2015) MetDFBA: incorporating time-resolved metabolomics measurements into dynamic flux balance analysis. Mol BioSyst 11(1):137–145
Zhang Y, Rajapakse JC (2009) Machine learning in bioinformatics, vol 4. Wiley, London
Leung MK, Delong A, Alipanahi B, Frey BJ (2016) Machine learning in genomic medicine: a review of computational problems and data sets. Proc IEEE 104(1):176–197
Angermueller C, Pärnamaa T, Parts L, Stegle O (2016) Deep learning for computational biology. Mol Syst Biol 12(7):878
Min S, Lee B, Yoon S (2017) Deep learning in bioinformatics. Briefings Bioinform 18(5):851–869
Libbrecht MW, Noble WS (2015) Machine learning applications in genetics and genomics. Nat Rev Genet 16(6):321
Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow PM, Zietz M, Hoffman MM, et al (2018) Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 15(141):20170
Zeng ISL, Lumley T (2018) Review of statistical learning methods in integrated omics studies (an integrated information science). Bioinform Biol Insights 12:1177932218759
Horgan RP, Kenny LC (2011) ‘omic’ technologies: genomics, transcriptomics, proteomics and metabolomics. Obstet Gynaecol 13(3):189–195
Biedendieck R, Borgmeier C, Bunk B, Stammen S, Scherling C, Meinhardt F, Wittmann C, Jahn D (2011) Systems biology of recombinant protein production using bacillus megaterium. In: Methods in enzymology, vol 500. Elsevier, Amsterdam, pp 165–195
Fondi M, Liò P (2015) Multi-omics and metabolic modelling pipelines: challenges and tools for systems microbiology. Microbiol Res 171:52–64
Yurkovich JT, Palsson BO (2018) Quantitative-omic data empowers bottom-up systems biology. Curr Opin Biotechnol 51:130–136
Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23(7–8):2031–2038
Vijayakumar S, Conway M, Lió P, Angione C (2018) Seeing the wood for the trees: a forest of methods for optimization and omic-network integration in metabolic modelling. Briefings Bioinform 19(6):1218–1235
Serra A, Fratello M, Fortino V, Raiconi G, Tagliaferri R, Greco D (2015) MVDA: a multi-view genomic data integration methodology. BMC Bioinform 16(1):261
Zampieri G, Vijayakumar S, Yaneske E, Angione C (2019) Machine and deep learning meet genome-scale metabolic modeling. PLoS Comput Biol 15(7):e1007
Sertbas M, Ulgen KO (2018) Unlocking human brain metabolism by genome-scale and multiomics metabolic models: relevance for neurology research, health, and disease. OMICS: J Integr Biol 22(7):455–467
Culley C, Vijayakumar S, Zampieri G, Angione C (2020) A mechanism-aware and multiomic machine-learning pipeline characterizes yeast cell growth. Proc Nat Acad Sci 117(31):18,869–18,879
Tong L, Mitchel J, Chatlin K, Wang MD (2020) Deep learning based feature-level integration of multi-omics data for breast cancer patients survival analysis. BMC Med Inform Decis Making 20(1):1–12
Jhajharia S, Verma S, Kumar R (2016) Predictive analytics for breast cancer survivability: a comparison of five predictive models. In: Proceedings of the second international conference on information and communication technology for competitive strategies. ACM, New York, p 26
Ma Z, Krings AW (2008) Survival analysis approach to reliability, survivability and prognostics and health management (PHM). In: 2008 IEEE aerospace conference. IEEE, Piscataway, pp 1–20
Iuliano A, Occhipinti A, Angelini C, De Feis I, Lió P (2016) Cancer markers selection using network-based Cox regression: a methodological and computational practice. Front Physiol 7:208
Iuliano A, Occhipinti A, Angelini C, De Feis I, Liò P (2018) Combining pathway identification and breast cancer survival prediction via screening-network methods. Front Genet 9:206
Lee C, Zame WR, Yoon J, van der Schaar M (2018) Deephit: a deep learning approach to survival analysis with competing risks. In: AAAI, pp 2314–2321
Wang P, Li Y, Reddy CK (2019) Machine learning for survival analysis: a survey. ACM Comput Surv 51(6):1–36
Zupan B, DemšAr J, Kattan MW, Beck JR, Bratko I (2000) Machine learning for survival analysis: a case study on recurrence of prostate cancer. Artif Intell Med 20(1):59–75
Harrell Jr FE, Lee KL, Califf RM, Pryor DB, Rosati RA (1984) Regression modelling strategies for improved prognostic prediction. Stat Med 3(2):143–152
Brier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 78(1):1–3
Kleinbaum DG, Klein M (2010) Survival analysis. Springer, Berlin
Nisbet R, Elder J, Miner G (2009) Basic algorithms for data mining: a brief overview. In: Handbook of statistical analysis and data mining applications, pp 121–150
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge. https://www.deeplearningbook.org
Xu J, Wu P, Chen Y, Meng Q, Dawood H, Dawood H (2019) A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data. BMC Bioinform 20(1):1–11
Lemsara A, Ouadfel S, Fröhlich H (2020) Pathme: pathway based multi-modal sparse autoencoders for clustering of patient-level multi-omics data. BMC Bioinform 21:1–20
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114
Simidjievski N, Bodnar C, Tariq I, Scherer P, Andres Terre H, Shams Z, Jamnik M, Liò P (2019) Variational autoencoders for cancer data integration: design principles and computational practice. Front Genet 10:1205
Liang M, Li Z, Chen T, Zeng J (2014) Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach. IEEE/ACM Trans Comput Biol Bioinform 12(4):928–937
Sharifi-Noghabi H, Zolotareva O, Collins CC, Ester M (2019) Moli: multi-omics late integration with deep neural networks for drug response prediction. Bioinformatics 35(14):i501–i509
Cheerla A, Gevaert O (2019) Deep learning with multimodal representation for pancancer prognosis prediction. Bioinformatics 35(14):i446–i454
Chen R, Yang L, Goodison S, Sun Y (2020) Deep-learning approach to identifying cancer subtypes using high-dimensional genomic data. Bioinformatics 36(5):1476–1483
Wang D, Liu S, Warrell J, Won H, Shi X, Navarro FC, Clarke D, Gu M, Emani P, Yang YT, et al. (2018) Comprehensive functional genomic resource and integrative model for the human brain. Science 362(6420):eaat8464
Qiu S, Joshi PS, Miller MI, Xue C, Zhou X, Karjadi C, Chang GH, Joshi AS, Dwyer B, Zhu S, et al (2020) Development and validation of an interpretable deep learning framework for Alzheimer’s disease classification. Brain 143(6):1920–1933
Cuperlovic-Culf M (2018) Machine learning methods for analysis of metabolic data and metabolic pathway modeling. Metabolites 8(1):4
Vijayakumar S, Conway M, Lió P, Angione C (2018) Optimization of multi-omic genome-scale models: methodologies, hands-on tutorial, and perspectives. In: Metabolic network reconstruction and modeling. Springer, Berlin, pp 389–408
Lawson C, Martí JM, Radivojevic T, Jonnalagadda SVR, Gentz R, Hillson NJ, Peisert S, Kim J, Simmons BA, Petzold CJ, et al (2021) Machine learning for metabolic engineering: a review. Metab Eng 63(1):34–60
Ben Guebila M, Thiele I (2019) Predicting gastrointestinal drug effects using contextualized metabolic models. PLoS Comput Biol 15(6):e1007,100
Guo W, Xu Y, Feng X (2017) Deepmetabolism: a deep learning system to predict phenotype from genome sequencing. arXiv preprint arXiv:170503094
Ajjolli Nagaraja A, Fontaine N, Delsaut M, Charton P, Damour C, Offmann B, Grondin-Perez B, Cadet F (2019) Flux prediction using artificial neural network (ANN) for the upper part of glycolysis. PloS One 14(5):e0216,178
Occhipinti A, Eyassu F, Rahman TJ, Rahman PK, Angione C (2018) In silico engineering of pseudomonas metabolism reveals new biomarkers for increased biosurfactant production. PeerJ 6:e6046
Yaneske E, Angione C (2018) The poly-omics of ageing through individual-based metabolic modelling. BMC Bioinform 19(14):83–96
Yang JH, Wright SN, Hamblin M, McCloskey D, Alcantar MA, Schrübbers L, Lopatkin AJ, Satish S, Nili A, Palsson BO, et al. (2019) A white-box machine learning approach for revealing antibiotic mechanisms of action. Cell 177(6):1649–1661
Vijayakumar S, Rahman PKMSM, Angione C (2020) A hybrid flux balance analysis and machine learning pipeline elucidates the metabolic response of cyanobacteria to different growth conditions. iScience 23(12):101818
Kavvas ES, Yang L, Monk JM, Heckmann D, Palsson BO (2020) A biochemically-interpretable machine learning classifier for microbial GWAS. Nat Commun 11(1):1–11
Occhipinti A, Hamadi Y, Kugler H, Wintersteiger C, Yordanov B, Angione C (2020) Discovering essential multiple gene effects through large scale optimization: an application to human cancer metabolism. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2020.2973386
Zhang J, Petersen SD, Radivojevic T, Ramirez A, Pérez-Manríquez A, Abeliuk E, Sánchez BJ, Costello Z, Chen Y, Fero MJ, et al. (2020) Combining mechanistic and machine learning models for predictive engineering and optimization of tryptophan metabolism. Nat Commun 11(1):1–13
Heirendt L, Arreckx S, Pfau T, Mendoza SN, Richelle A, Heinken A, Haraldsdóttir HS, Wachowiak J, Keating SM, Vlasov V, et al. (2019) Creation and analysis of biochemical constraint-based models using the cobra toolbox v. 3.0. Nat Protoc 14(3):639–702
Angione C, Conway M, Lió P (2016) Multiplex methods provide effective integration of multi-omic data in genome-scale models. BMC Bioinform 17(4):257–269
Tian M, Reed JL (2018) Integrating proteomic or transcriptomic data into metabolic models using linear bound flux balance analysis. Bioinformatics 34(22):3882–3888
Acknowledgements
We would like to acknowledge the support from UKRI Research England’s THYME project, from the Children’s Liver Disease Foundation, and from Earlier.org.
Author Contributions
Conceptualization: S.V. and C.A.; Data curation: S.V., G.M. and A.O.; Formal analysis: S.V., G.M., P.M. and A.O.; Funding Acquisition: C.A.; Investigation: S.V., G.M., P.M. and A.O.; Methodology: S.V., G.M., P.M., A.O. and C.A.; Project administration: S.V. and C.A.; Resources: S.V., G.M., P.M. and A.O.; Software: S.V., G.M., P.M., A.O. and C.A.; Supervision: S.V. and C.A.; Validation: S.V., G.M. and A.O.; Visualization: S.V.; Writing—original draft: S.V., G.M., P.M., A.O. and C.A.; Writing—reviewing and editing: S.V., G.M., A.O. and C.A.
Declaration of Interests The authors declare no competing interests.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply
About this protocol
Cite this protocol
Vijayakumar, S., Magazzù, G., Moon, P., Occhipinti, A., Angione, C. (2022). A Practical Guide to Integrating Multimodal Machine Learning and Metabolic Modeling. In: Cortassa, S., Aon, M.A. (eds) Computational Systems Biology in Medicine and Biotechnology. Methods in Molecular Biology, vol 2399. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1831-8_5
Download citation
DOI: https://doi.org/10.1007/978-1-0716-1831-8_5
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1830-1
Online ISBN: 978-1-0716-1831-8
eBook Packages: Springer Protocols