Skip to main content

Machine Learning Classification of Females Susceptibility to Visceral Fat Associated Diseases

  • Conference paper
  • First Online:
XV Mediterranean Conference on Medical and Biological Engineering and Computing – MEDICON 2019 (MEDICON 2019)

Abstract

The problem of classifying subjects into risk categories is a common challenge in medical research. Machine Learning (ML) methods are widely used in the areas of risk prediction and classification. The primary objective of these algorithms is to predict dichotomous responses (e.g. healthy/at risk) based on several features. Similarly to statistical inference models, also ML models are subject to the common problem of class imbalance. Therefore, they are affected by the majority class increasing the false negative rate.

In this paper, we built and evaluated eighteen ML models classifying approximately 4300 female participants from the UK Biobank into three categorical risk statuses based on responses for the discretised visceral adipose tissue values from magnetic resonance imaging. We also examined the effect of sampling techniques on classification modelling when dealing with class imbalance.

Results showed that the use of sampling techniques had a significant impact. They not only drove an improvement in predicting patients risk status, but also facilitated an increase in the information contained within each variable. Based on domain experts criteria, the three best models for classification were finally identified.

These encouraging results will guide further developments of classification models for predicting visceral adipose tissue without the need for a costly scan.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Making 5(4), 597–604 (2006)

    Article  Google Scholar 

  2. Gu, J., Zhou, Y., Zuo, X.: Making class bias useful: a strategy of learning from imbalanced data. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 287–295. Springer, Heidelberg (2007)

    Google Scholar 

  3. More, A.: Survey of resampling techniques for improving classification performance in unbalanced datasets. arXiv:1608.06048 [stat.AP] (2016)

  4. Weiss, G.M., McCarthy, K., Zabar, B.: Cost-sensitive learning vs. sampling : which is best for handling unbalanced classes with unequal error costs?. In: Proceedings of the 2007 International Conference on Data Mining, Las Vegas, USA, pp. 35–41 (2007)

    Google Scholar 

  5. Bekkar, M., Taklit, A.A.: Imbalanced data learning approaches review. Int. J. Data Min. Knowl. Manag. Process (IJDKP) 3(4), 15–33 (2013)

    Article  Google Scholar 

  6. Ensemble Learning to Improve Machine Learning Results. https://blog.statsbot.co/ensemble-learning-d1dcd548e936. Accessed 19 Feb 2019

  7. Dzeroski, S., Zenko, B.: Is combining classifiers better than selecting the best one? In: Proceedings of the Nineteenth International Conference on Machine Learning, San Francisco. Morgan Kaufmann (2002)

    Google Scholar 

  8. Choi, J.M.: A Selective Sampling Method for Imbalanced Data Learning on Support Vector Machines. Graduate Thesis and Dissertation, Iowa State University (2010)

    Google Scholar 

  9. Unbalanced Data Is a Problem? No, Balanced Data Is Worse. https://matloff.wordpress.com/2015/09/29/unbalanced-data-is-a-problem-no-balanced-data-is-worse/. Accessed 24 Feb 2019

  10. When should I balance classes in a training data set? https://stats.stackexchange.com/questions/227088/when-should-i-balance-classes-in-a-training-data-set. Accessed 22 Nov 2018

  11. Bharat Rao, R., Fung, G., Rosales, R.: On the dangers of cross-validation. An experimental evaluation. In: Proceedings of the 2008 SIAM International Conference on Data Mining, pp. 588–596 (2008)

    Google Scholar 

  12. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  13. Faith, J., Mintram, R., Angelova, M.: Gene expression targeted projection pursuit for visualizing gene expression data classifications. Bioinformatics 22(21), 2667–2673 (2006)

    Article  Google Scholar 

  14. Information gain which test is more informative? https://homes.cs.washington.edu/~shapiro/EE596/notes/InfoGain.pdf. Accessed 29 Mar 2019

  15. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  16. Wang, Y.C., McPherson, K., Marsh, T., Gortmaker, S.L., Brown, M.: Health and economic burden of the projected obesity trends in the USA and the UK. Lancet 378(9793), 815–825 (2011)

    Article  Google Scholar 

  17. Sam, S., Mazzone, T.: Adipose tissue changes in obesity and the impact on metabolic function. Transl. Res. 164(4), 284–292 (2014)

    Article  Google Scholar 

  18. Dattilo, A.M., Kris-Etherton, P.M.: Effects of weight reduction on blood lipids and lipoproteins: a meta-analysis. Am. J. Clin. Nutr. 56(2), 320–328 (1992)

    Article  Google Scholar 

  19. Fox, C.S., et al.: Abdominal visceral and subcutaneous adipose tissue compartments. Circulation 116(1), 39–48 (2007)

    Article  Google Scholar 

  20. Després, J.-P., Lemieux, I., Bergeron, J., Pibarot, P., Mathieu, P., Larose, E., Rodés-Cabau, J., Bertrand, O.F., Poirier, P.: Abdominal obesity and the metabolic syndrome: contribution to global cardiometabolic risk. Arterioscler. Thromb. Vasc. Biol. 28(6), 1039–1049 (2008)

    Article  Google Scholar 

  21. Chin, S.-H., Kahathuduwa, C.N., Binks, M.: Physical activity and obesity: what we know and what we need to know*. Obes. Rev. 17(12), 1226–1244 (2016)

    Article  Google Scholar 

  22. Golabi, P., Bush, H., Younossi, Z.M.: Treatment strategies for nonalcoholic fatty liver disease and nonalcoholic steatohepatitis. Clin. Liver Dis. 21(4), 739–753 (2017)

    Article  Google Scholar 

  23. Uusitupa, M., Lindi, V., Louheranta, A., Salopuro, T., Lindström, J., Tuomilehto, J.: Long-term improvement in insulin sensitivity by changing lifestyles of people with impaired glucose tolerance. Diabetes 52(10), 2532–2538 (2003)

    Article  Google Scholar 

  24. Brouwers, B., Hesselink, M.K.C., Schrauwen, P., Schrauwen-Hinderling, V.B.: Effects of exercise training on intrahepatic lipid content in humans. Diabetologia 59(10), 2068–2079 (2016)

    Article  Google Scholar 

  25. Sudlow, C., Gallacher, J., Allen, N., Beral, V., Burton, P., Danesh, J., et al.: UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12(3), e1001779 (2015)

    Article  Google Scholar 

  26. Information gain, mutual information and related measures - Cross Validated. https://stats.stackexchange.com/questions/13389/information-gain-mutual-information-and-related-measures. Accessed 22 Oct 2018

  27. Haddow, C., Perry, J., Durrant, M., Faith, J.: Predicting functional residues of protein sequence alignments as a feature selection task. Int. J. Data Min. Bioinform. 5(6), 691–705 (2011)

    Article  Google Scholar 

  28. Drummond, C., Holte, R.C.: C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Proceedings of the International Conference on Machine Learning, Workshop Learning from Imbalanced Data Sets II (2003)

    Google Scholar 

  29. Manning, C., Raghavan, P., Schutze, H.: Introduction to information retrieval. Nat. Lang. Eng. 16(1), 100–103 (2010)

    Article  Google Scholar 

  30. Zhang H.: The optimality of Naive Bayes. In: American Association for Artificial Intelligence (2004)

    Google Scholar 

  31. Landwehr, N., Hall, M., Frank, E.: Logistic model trees. Mach. Learn. 59(1–2), 161–205 (2005)

    Article  Google Scholar 

  32. Ayer, T., Chhatwal, F., Alagoz, O., Kahn, C.E., Woods, R.W., Burnside, E.S.: Comparison of logistic regression and artificial neural network models in breast cancer risk estimation. Radio Graphics 30(1), 13–22 (2010)

    Google Scholar 

  33. Quinlan, J.R.: Improved use of continuous attributes in C4.5. J. Artif. Intell. Res. 4, 77–90 (1996)

    Article  Google Scholar 

  34. Witten, I.H., Frank, E.: Data Mining, Practical Machine Learning Tools and Techniques, 2nd edn. Elsevier Inc., Amsterdam (2005)

    MATH  Google Scholar 

  35. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  36. Jonsdottir, T., Hvannberg, E.T., Sigurdsson, H., Sigurdsson, S.: The feasibility of constructing a predictive outcome model for breast cancer using the tools of data mining. Expert Syst. Appl. 34(1), 108–118 (2008)

    Article  Google Scholar 

  37. Maheshwari, S., Agrawal, J., Sharma, S.: A new approach for classification of highly imbalanced data sets using evolutionary algorithms. Int. J. Sci. Eng. Res. 2(7), 1–5 (2011)

    Google Scholar 

  38. Computing Precision and Recall for Multi-class Classification Problems. http://text-analytics101.rxnlp.com/2014/10/computing-precision-and-recall-for.html. Accessed 02 Aug 2018

  39. Parkinson, J.R. et al.: Visceral adipose tissue, thigh adiposity and liver fat fraction: a cross sectional analysis of the UK Biobank. UK Biobank (2019)

    Google Scholar 

  40. Bagging and Random Forest Ensemble Algorithms for Machine Learning. https://machinelearningmastery.com/bagging-and-random-forest-ensemble-algorithms-for-machine-learning/. Accessed 22 Oct 2018

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahmoud Aldraimli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aldraimli, M. et al. (2020). Machine Learning Classification of Females Susceptibility to Visceral Fat Associated Diseases. In: Henriques, J., Neves, N., de Carvalho, P. (eds) XV Mediterranean Conference on Medical and Biological Engineering and Computing – MEDICON 2019. MEDICON 2019. IFMBE Proceedings, vol 76. Springer, Cham. https://doi.org/10.1007/978-3-030-31635-8_81

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-31635-8_81

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-31634-1

  • Online ISBN: 978-3-030-31635-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics