Skip to main content

Association Rule Learning and Frequent Sequence Mining of Cancer Diagnoses in New York State

  • Conference paper
  • First Online:
Data Management and Analytics for Medicine and Healthcare (DMAH 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10494))

Abstract

Analyzing large scale diagnosis histories of patients could help to discover comorbidity or disease progression patterns. Recently, open data initiatives make it possible to access statewide patient data at individual level, such as New York State SPARCS data. The goal of this study is to explore frequent disease co-occurrence and sequence patterns of cancer patients in New York State using SPARCS data. Our collection includes 18,208,830 discharge records from 1,565,237 patients with cancer-related diagnoses during 2011–2015. We use Apriori algorithm to discover top disease co-occurrences for common cancer categories based on support. We generate top frequent sequences of diagnoses with at least one cancer related diagnosis from patients’ diagnosis histories using the cSPADE algorithm. Our data driven approach provides essential knowledge to support the investigation of disease co-occurrence and progression patterns for improving the management of multiple diseases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Stiglic, G., Brzan, P.P., Fijacko, N., Wang, F., Delibasic, B., Kalousis, A., Obradovic, Z.: Comprehensible predictive modeling using regularized logistic regression and comorbidity based features. PLoS ONE 10(12), e0144439 (2015). doi:10.1371/journal.pone.0144439

    Article  Google Scholar 

  2. Lappenschaar, M., Hommersom, A., Lagro, J., Lucas, P.J.: Understanding the co-occurrence of diseases using structure learning. In: Conference on Artificial Intelligence in Medicine in Europe, pp. 135–144 (2013). doi:10.1007/978-3-642-38326-7_21

  3. Munson, M.E., Wrobel, J.S., Holmes, C.M., Hanauer, D.A.: Data mining for identifying novel associations and temporal relationships with Charcot foot. J. Diabetes Res. (2014). doi:10.1155/2014/214353

  4. Kost, R., Littenberg, B., Chen, E.S.: Exploring generalized association rule mining for disease co-occurrences. In: AMIA Annual Symposium Proceedings 2012, p. 1284 (2012)

    Google Scholar 

  5. Jensen, P.B., Jensen, L.J., Brunak, S.: Mining electronic health records: towards better research applications and clinical care. Nat. Rev. Genet. 13(6), 395–405 (2012). doi:10.1038/nrg3208

    Article  Google Scholar 

  6. Kléma, J., Nováková, L., Karel, F., Stepankova, O., Zelezny, F.: Sequential data mining: a comparative case study in development of atherosclerosis risk factors. IEEE Trans. Syst. Man Cybern. Part C (Applications and Reviews) 38(1), 3–15 (2008). doi:10.1109/tsmcc.2007.906055

  7. Baxter, R.A., Williams, G.J., He, H.: Feature selection for temporal health records. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 198–209 (2001). doi:10.1007/3-540-45357-1_24

  8. Lin, W., Orgun, M.A., Williams, G.J.: Mining temporal patterns from health care data. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 222–231 (2002). doi:10.1007/3-540-46145-0_22

  9. Ferver, K., Burton, B., Jesilow, P.: The use of claims data in healthcare research. Open Public Health J. 2, 11–24 (2009). doi:10.2174/1874944500902010011

    Article  Google Scholar 

  10. Tyree, P.T., Lind, B.K., Lafferty, W.E.: Challenges of using medical insurance claims data for utilization analysis. Am. J. Med. Qual. 21(4), 269–275 (2006). doi:10.1177/1062860606288774

    Article  Google Scholar 

  11. Ram, S., Zhang, W., Williams, M., Pengetnze, Y.: Predicting asthma-related emergency department visits using big data. IEEE J. Biomed. Health Inform. 19(4), 1216–1223 (2015). doi:10.1109/jbhi.2015.2404829

    Article  Google Scholar 

  12. López-Soto, P.J., Smolensky, M.H., Sackett-Lundeen, L.L., De Giorgi, A., Rodríguez-Borrego, M.A., Manfredini, R., Pelati, C., Fabbian, F.: Temporal patterns of in-hospital falls of elderly patients. Nurs. Res. 65(6), pp. 435–445 (2016). doi:10.1097/nnr.0000000000000184

  13. Statewide Planning and Research Cooperative System (SPARCS). https://www.health.ny.gov/statistics/sparcs/

  14. Chen, X., Wang, F.: Integrative spatial data analytics for public health studies of new york state. In: AMIA Annual Symposium Proceedings, vol. 2016, p. 391 (2016)

    Google Scholar 

  15. Chen, X., Wang, Y., Schoenfeld, E., Saltz, M., Saltz, J., Wang, F.: Spatio-temporal analysis for New York State SPARCS data. In: Proceedings of 2017 AMIA Joint Summits on Translational Science (2017)

    Google Scholar 

  16. Bekelis, K., Missios, S., Coy, S., Rahmani, R., Singer, R.J., MacKenzie, T.A.: Surgical clipping versus endovascular intervention for the treatment of subarachnoid hemorrhage patients in New York State. PLoS ONE 10(9), e0137946 (2015). doi:10.1371/journal.pone.0137946

    Article  Google Scholar 

  17. Missios, S., Bekelis, K.: Regional disparities in hospitalization charges for patients undergoing craniotomy for tumor resection in New York State: correlation with outcomes. J. Neurooncol. 128(2), 365–371 (2016). doi:10.1007/s11060-016-2122-0

    Article  Google Scholar 

  18. Bekelis, K., Missios, S., Coy, S., MacKenzie, T.A.: Scope of practice and outcomes of cerebrovascular procedures in children. Child’s Nerv. Syst. 32(11), 2159–2164 (2016). doi:10.1007/s00381-016-3114-2

    Article  Google Scholar 

  19. Bekelis, K., Missios, S., Coy, S., MacKenzie, T.A.: Comparison of outcomes of patients with inpatient or outpatient onset ischemic stroke. J. Neurointerventional Surg., pp. neurintsurg-2015 (2016). doi:10.1136/neurintsurg-2015-012145

  20. Dy, C.J., Lane, J.M., Pan, T.J., Parks, M.L., Lyman, S.: Racial and socioeconomic disparities in hip fracture care. J. Bone Joint Surg. Am. 98(10), 858–865 (2016)

    Article  Google Scholar 

  21. Kim, H., Schwartz, R.M., Hirsch, J., Silverman, R., Liu, B., Taioli, E.: Effect of Hurricane Sandy on Long Island emergency departments visits. Disaster Med. Public Health Preparedness 10(03), 344–350 (2016). doi:10.1017/dmp.2015.189

    Article  Google Scholar 

  22. He, F.T., De La Cruz, N.L., Olson, D., Lim, S., Seligson, A.L., Hall, G., Jessup, J., Gwynn, C.: Temporal and spatial patterns in utilization of mental health services during and after hurricane sandy: emergency department and inpatient hospitalizations in New York City. Disaster Med. Public Health Preparedness 10(03), 512–517 (2016). doi:10.1017/dmp.2016.89

    Article  Google Scholar 

  23. Hodgins, J.L., Vitale, M., Arons, R.R., Ahmad, C.S.: Epidemiology of medial ulnar collateral ligament reconstruction: a 10-year study in New York State. Am. J. Sports Med. 44(3), 729–734 (2016). doi:10.1177/0363546515622407

    Article  Google Scholar 

  24. Arakaki, L., Ngai, S., Weiss, D.: Completeness of Neisseria meningitidis reporting in New York City, 19892010. Epidemiol. Infect. 144(11), 2374–2381 (2016). doi:10.1017/s0950268816000406

    Article  Google Scholar 

  25. Cancer facts & figures 2017. American Cancer Society (2017)

    Google Scholar 

  26. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)

    Google Scholar 

  27. Zaki, M.J.: Sequence mining in categorical domains: incorporating constraints. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, pp. 422–429 (2000). doi:10.1145/354756.354849

  28. Mayo Clinic. http://www.mayoclinic.org

Download references

Acknowledgments

This work is supported in part by NSF ACI 1443054, by NSF IIS 1350885 and by NSF IIP1069147.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fusheng Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Wang, Y., Wang, F. (2017). Association Rule Learning and Frequent Sequence Mining of Cancer Diagnoses in New York State. In: Begoli, E., Wang, F., Luo, G. (eds) Data Management and Analytics for Medicine and Healthcare. DMAH 2017. Lecture Notes in Computer Science(), vol 10494. Springer, Cham. https://doi.org/10.1007/978-3-319-67186-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67186-4_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67185-7

  • Online ISBN: 978-3-319-67186-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics