Skip to main content

Understanding the Effects of Sampling on Healthcare Risk Modeling for the Prediction of Future High-Cost Patients

  • Conference paper
Biomedical Engineering Systems and Technologies (BIOSTEC 2008)

Abstract

Rapidly rising healthcare costs represent one of the major issues plaguing the healthcare system. Data from the Arizona Health Care Cost Containment System, Arizona’s Medicaid program provide a unique opportunity to exploit state-of-the-art machine learning and data mining algorithms to analyze data and provide actionable findings that can aid cost containment. Our work addresses specific challenges in this real-life healthcare application with respect to data imbalance in the process of building predictive risk models for forecasting high-cost patients. We survey the literature and propose novel data mining approaches customized for this compelling application with specific focus on non-random sampling. Our empirical study indicates that the proposed approach is highly effective and can benefit further research on cost containment in the healthcare industry.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bodenheimer, T.: High and Rising Health Care Costs. Part 1: Seeking an Explanation. Ann. Intern. Med. 142, 847–854 (2005)

    Article  PubMed  Google Scholar 

  2. Berk, M.L., Monheit, A.C.: The Concentration of Health Care Expenditures, Revisited. Health Affairs 20(2), 9–18 (2001)

    Article  CAS  PubMed  Google Scholar 

  3. Scheffer, J.: Data Mining in the Survey Setting: Why do Children go off the Rails? Res. Lett. Inf. Math. Sci. 3, 161–189 (2002)

    Google Scholar 

  4. Zhang, D., Zhou, L.: Discovering Golden Nuggets: Data Mining in Financial Application. IEEE Trans. Sys. Man Cybernet 34(4), 513–522 (2004)

    Article  Google Scholar 

  5. Anderson, R.T., Balkrishnan, R., Camacho, F.: Risk Classification of Medicare HMO Enrollee Cost Levels using a Decision-Tree Approach. Am. J. Managed Care 10(2), 89–98 (2004)

    Google Scholar 

  6. Cios, K.J., Moore, G.W.: Uniqueness of Medical Data Mining. Artificial Intelligence in Medicine 26(1-2), 1–24 (2002)

    Article  PubMed  Google Scholar 

  7. Li, J., Fu, A.W., He, H., Chen, J., Jin, H., McAullay, D., et al.: Mining Risk Patterns in Medical Data. In: Proc 11th ACM SIGKDD Int’l Conf. Knowledge Discovery in Data Mining (KDD 2005), pp. 770–775 (2005)

    Google Scholar 

  8. Chawla, N.V., Japkowicz, N., Kolcz, A.: Editorial: Special Issue on Learning from Imbalanced Data Sets. ACM SIGKDD Explorations Newsletter 6(1), 1–6 (2004)

    Article  Google Scholar 

  9. McCarthy, K., Zabar, B., Weiss, G.: Does cost-sensitive learning beat sampling for classifying rare classes? In: Proc. 1st Int’l Workshop on Utility-based data mining (UBDM 2005), pp. 69–77 (2005)

    Google Scholar 

  10. Weiss, G.M., Provost, F.: The Effect of Class Distribution on Classifier Learning: An Empirical Study (Dept. Computer Science, Rutgers University, tech. report ML-TR-44 (2001)

    Google Scholar 

  11. Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. ACM SIGKDD Explorations Newsletter 6(1), 20–29 (2004)

    Article  Google Scholar 

  12. Drummond, C., Holte, R.C.: C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling. In: ICML Workshop Learning From Imbalanced Datasets II (2003)

    Google Scholar 

  13. Maloof, M.: Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown. In: ICML Workshop Learning From Imbalanced Datasets II (2003)

    Google Scholar 

  14. Estabrooks, A., Jo, T., Japkowicz, N.: A Multiple Resampling Method For Learning From Imbalanced Data Sets. Computational Intelligence 20(1), 18–36 (2004)

    Article  Google Scholar 

  15. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)

    Google Scholar 

  16. Moturu, S.T., Johnson, W.G., Liu, H.: Predicting Future High-Cost Patients: A Real-World Risk Modeling Application. In: Proc. IEEE International Conference on Bioinformatics and Biomedicine (2007)

    Google Scholar 

  17. Diehr, P., Yanez, D., Ash, A., Hornbrook, M., Lin, D.Y.: Methods For Analysing Health Care Utilization and Costs. Ann. Rev. Public Health 20, 125–144 (1999)

    Article  CAS  Google Scholar 

  18. Meenan, R.T., Goodman, M.J., Fishman, P.A., Hornbrook, M.C., O’Keeffe-Rosetti, M.C., Bachman, D.J.: Using Risk-Adjustment Models to Identify High-Cost Risks. Med. Care 41(11), 1301–1312 (2003)

    Article  PubMed  Google Scholar 

  19. Fleishman, J.A., Cohen, J.W., Manning, W.G., Kosinski, M.: Using the SF-12 Health Status Measure to Improve Predictions of Medical Expenditures. Med. Care 44(5S), I-54-I-66 (2006)

    Google Scholar 

  20. Perkins, A.J., Kroenke, K., Unutzer, J., Katon, W., Williams Jr., J.W., Hope, C., et al.: Common comorbidity scales were similar in their ability to predict health care costs and mortality. J. Clin. Epidemiology 57, 1040–1048 (2004)

    Article  Google Scholar 

  21. Farley, J.F., Harrdley, C.R., Devine, J.W.: A Comparison of Comorbidity Measurements to Predict Health care Expenditures. Am. J. Manag. Care 12, 110–117 (2006)

    PubMed  Google Scholar 

  22. Zhao, Y., Ash, A.S., Ellis, R.P., Ayanian, J.Z., Pope, G.C., Bowen, B., et al.: Predicting Pharmacy Costs and Other Medical Costs Using Diagnoses and Drug Claims. Med. Care 43(1), 34–43 (2005)

    PubMed  Google Scholar 

  23. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Moturu, S.T., Liu, H., Johnson, W.G. (2008). Understanding the Effects of Sampling on Healthcare Risk Modeling for the Prediction of Future High-Cost Patients. In: Fred, A., Filipe, J., Gamboa, H. (eds) Biomedical Engineering Systems and Technologies. BIOSTEC 2008. Communications in Computer and Information Science, vol 25. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92219-3_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-92219-3_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-92218-6

  • Online ISBN: 978-3-540-92219-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics