Abstract
Colorectal cancer (CRC) is a relatively common cause of death around the globe. Predictive models for the development of CRC could be highly valuable and could facilitate an early diagnosis and increased survival rates. Currently available predictive models are improving, but do not fully utilize the wealth of data available about patients in routine care nor do they take advantage of the developments in the area of data mining. In this paper, a first attempt to generate a predictive model using the CHAID decision tree learner based on anonymously extracted Electronic Medical Records is reported, showing an area under the curve (AUC) of .839 for the adult population and .702 for the age group between 55 and 75.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Breiman, L.: Bagging predictors. Machine Learning 26, 123–140 (1996)
Ferlay, J., Parkin, D.M., Steliarova-Foucher, E.: Estimates of cancer incidence and mortality in Europe in 2008. European Journal of Cancer 46(4), 765–781 (2010)
Grobbee, D.E., Hoes, A.W., Verheij, T.J., Schrijvers, A.J., van Ameijden, E.J., Numans, M.E.: The Utrecht Health Project: optimization of routine healthcare data for research. Eur. J. Epidemiol. 20(3), 285–287 (2005)
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a Receiver Operating Characteristic (ROC) curve. Radiology 143, 29–36 (1982)
Hippisley-Cox, J., Coupland, C.: Identifying patients with suspected colorectal cancer in primary care: derivation and validation of an algorithm. British Journal of GeneralPractice 62(594), e29–e37 (2012)
Kass, G.V.: An Exploratory Technique for Investigating Large Quantities of Categorical Data. Applied Statistics 29(2), 119–127 (1980)
Lamberts, H., Wood, M., Hofmans-Okkes, I.M.: International primary care classifications: the effect of fifteen years of evolution. Fam. Pract. 9(3), 330–339 (1992)
Laxman, S., Sastry, P.: A survey of temporal data mining. In: SADHANA, Academy Proceedings in Engineering Sciences, vol. 31 (2006)
Marshall, T., Lancashire, R., Sharp, D., Peters, T.J., Cheng, K.K., Hamilton, W.: The diagnostic performance of scoring systems to identify symptomatic colorectal cancer compared to current referral guidance. Gut. 60(9), 1242–1248 (2011)
Patnaik, D., Butler, P., Ramakrishnan, N., Parida, L., Keller, B.J., Hanauer, A.: Experiences with Mining Temporal Event Sequences from Electronic Medical Records. In: Proc. of ACM SIGKDD, pp. 360–368 (2011)
Post, A.R., Harrison, J.H.: Temporal data mining. Clinics in Laboratory Medicine 28(1), 83–100 (2008)
Quinlan, R.: Data Mining Tools See5 and C5.0 (2003), http://www.rulequest.com
Riboli, E., et al.: European Prospective Investigation into Cancer and Nutrition (EPIC): study populations and data collection. Public Health Nutrition 5(6b), 1113–1124 (2002)
Zhang, J., Silvescu, A., Honavar, V.G.: Ontology-driven induction of decision trees at multiple levels of abstraction. In: Koenig, S., Holte, R. (eds.) SARA 2002. LNCS (LNAI), vol. 2371, p. 316. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hoogendoorn, M., Moons, L.M.G., Numans, M.E., Sips, RJ. (2014). Utilizing Data Mining for Predictive Modeling of Colorectal Cancer Using Electronic Medical Records. In: Ślȩzak, D., Tan, AH., Peters, J.F., Schwabe, L. (eds) Brain Informatics and Health. BIH 2014. Lecture Notes in Computer Science(), vol 8609. Springer, Cham. https://doi.org/10.1007/978-3-319-09891-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-09891-3_13
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09890-6
Online ISBN: 978-3-319-09891-3
eBook Packages: Computer ScienceComputer Science (R0)