Skip to main content
Log in

Evolution Paths for Knowledge Discovery and Data Mining Process Models

  • Review Article
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Despite the hype around data analytics, the success rate of analytics initiatives remains very low and the value of data in organisations is left hidden. Various research studies show that the main barriers to analytics adoption are organisational and the lack of structured approaches on how to conduct analytics initiatives is a possible cause of analytics project failures. Data mining process models then become fundamental means to support analytics project management and minimise the risk of data dredging. In this paper, Knowledge Discovery and Data Mining process models are reviewed starting from the most popular models currently in use. Four distinctive research paths for data mining process models have emerged. These evolution paths seem to address limitations of the CRISP–DM model which remains the de facto standard in industry. The research streams identified include the evolution of the human role; the relevance of iteration and interactions; the role of data and knowledge repositories; and the integration of software engineering/agile methodologies. In the future, these four research streams should be combined to support the development of more encompassing process models. 

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25

Similar content being viewed by others

References

  1. Jagadish H, Gehrke J, Labrinidis A, Papakonstantinou Y, Patel JM, Ramakrishnan R, Shahabi C. Big data and its technical challenges. Commun ACM. 2014;57(7):86–94.

    Article  Google Scholar 

  2. Kiron D. Lessons from becoming a data-driven organization. MIT Sloan Manag Rev 2017;58(2).

  3. Mason EA. A.I. and big data could power a new war on poverty. https://www.nytimes.com/2018/01/01/opinion/ai-and-big-data-could-power-a-new-war-on-poverty.html; 2018.

  4. Marr B. Big data: 20 mind-boggling facts everyone must read. https://www.forbes.com/sites/bernardmarr/2015/09/30/big-data-20-mind-boggling-facts-everyone-must-read/31f18b9217b1; 2015.

  5. LaValle S, Lesser E, Shockley R, Hopkins MS, Kruschwitz N. Big data, analytics and the path from insights to value. MIT Sloan Manag Rev. 2011;52(2):21–32.

    Google Scholar 

  6. Henke N, Bughin J, Chui M, Manyika J, Saleh T, Wiseman B, Sethupathy G. The age of analytics: competing in a data-driven world. McKinsey Global Institute 4; 2016.

  7. van der Meulen TMR. Gartner survey shows organizations are slow to advance in data and analytics. Gartner Newsroom; 2018.

  8. Geissbauer R, Vedso J, Schrauf S. Industry 4.0: Building the digital enterprise. Retrieved from PwC Website: https://wwwpwccom/gx/en/industries/industries-40/landing-page/industry-40-building-your-digital-enterprise-april-2016pdf; 2016.

  9. McLellan C. Turning big data into business insights: the state of play. https://www.zdnet.com/article/turning-big-data-into-business-insights-the-state-of-play/; 2017.

  10. Dykes B. Five roadblocks to successfully becoming a data-driven business. https://www.forbes.com/sites/brentdykes/2015/09/30/five-roadblocks-to-successfully-becoming-a-data-driven-business/4f24a19378ac; 2015.

  11. Rollins J. Why we need a methodology for data science. https://www.ibmbigdatahub.com/blog/why-we-need-methodology-data-science; 2015.

  12. Marbán Ó, Mariscal G, Segovia J. A data mining & knowledge discovery process model. In: Data mining and knowledge discovery in real life applications, IntechOpen; 2009.

  13. Gerdeman D. Companies love big data but lack the strategy to use it effectively. https://hbswk.hbs.edu/item/companies-love-big-data-but-lack-strategy-to-use-it-effectively; 2017.

  14. Kurgan LA, Musilek P. A survey of knowledge discovery and data mining process models. Knowl Eng Rev. 2006;21(1):1–24.

    Article  Google Scholar 

  15. Saltz J, Crowston K, et al. Comparing data science project management methodologies via a controlled experiment. In: Proceedings of the 50th Hawaii international conference on system sciences; 2017.

  16. Saltz JS, The need for new processes, methodologies and tools to support big data teams and improve big data project effectiveness. In, IEEE international conference on big data (Big Data). IEEE. 2015;2015:2066–71.

  17. Azevedo AIRL, Santos MF. KDD, SEMMA and CRISP-DM: a parallel overview. IADS-DM; 2008.

  18. Piatetsky-Shapiro G. Knowledge discovery in databases: 10 years after. SIGKDD Explor. 2000;1(2):59–61.

    Article  Google Scholar 

  19. Mariscal G, Marban O, Fernandez C. A survey of data mining and knowledge discovery process models and methodologies. Knowl Eng Rev. 2010;25(2):137–66.

    Article  Google Scholar 

  20. Yang Q, Wu X. 10 challenging problems in data mining research. Int J Inf Technol Decis Mak. 2006;5(04):597–604.

    Article  Google Scholar 

  21. Fayyad UM, Piatetsky-Shapiro G, Smyth P, et al. Knowledge discovery and data mining: towards a unifying framework. KDD. 1996;96:82–8.

    Google Scholar 

  22. Jifa G, Lingling Z. Data, DIKW, big data and data science. Proc Comput Sci. 2014;31:814–21.

    Article  Google Scholar 

  23. Alnoukari M, El Sheikh A. Knowledge discovery process models: from traditional to agile modeling. In: Business intelligence and agile methodologies for knowledge-based organizations: cross-disciplinary applications. Pennsylvania: IGI Global; 2012. p. 72–100.

  24. Piatetsky G. Crisp-dm, still the top methodology for analytics, data mining, or data science projects. https://www.kdnuggets.com/2014/10/crisp-dm-top-methodology-analytics-data-mining-data-science-projects.html; 2014.

  25. Managing the analytics life cycle for decisions at scale. Technical report, SAS; 2016.

  26. Schmidt C, Sun WN. Synthesizing agile and knowledge discovery: case study results. J Comput Inf Syst. 2018;58(2):142–50.

    Google Scholar 

  27. Chapman P, Clinton J, Kerber R, Khabaza T, Reinartz T, Shearer C, Wirth R, et al. CRISP-DM 1.0: step-by-step data mining guide. SPSS inc 16; 2000.

  28. Hofmann M, Tierney B. Development of an enhanced generic data mining life cycle (DMLC). ITB J. 2009;10(1):4.

    Google Scholar 

  29. Brachman RJ, Anand T. The process of knowledge discovery in databases: a first sketch. In: KDD workshop; 1994. p. 1–12.

  30. Lee SW, Kerschberg L. A methodology and life cycle model for data mining and knowledge discovery in precision agriculture. In: SMC’98 conference proceedings. 1998 IEEE international conference on systems, man, and cybernetics (Vol 3. Cat. No. 98CH36218), IEEE; 1998. p. 2882–2887.

  31. Anand SS, Büchner AG. Decision support using data mining. Financial times management; 1998.

  32. Gupta S, Bhatnagar V, Wasan S, Somayajulu D, Somayajulu D. Intension mining: a new paradigm in knowledge discovery; 2000.

  33. Rogalewicz M, Sika R. Methodologies of knowledge discovery from data and data mining methods in mechanical engineering. Manag Prod Eng Rev. 2016;7(4):97–108.

    Google Scholar 

  34. Moyle S, Jorge A. Ramsys-a methodology for supporting rapid remote collaborative data mining projects. In: ECML/PKDD01 workshop: integrating aspects of data mining, decision support and meta-learning (Vol 64, IDDM-2001); 2001.

  35. Girardi D, Kueng J, Holzinger A. A domain-expert centered process model for knowledge discovery in medical research: putting the expert-in-the-loop. In: International conference on brain informatics and health, Springer, New York; 2015. p. 389–398.

  36. Collier K, Carey B, Grusy E, Marjaniemi C, Sautter D. A perspective on data mining. Centre for data insight: Northern Arizona University, USA; 1998. p. 2–4.

  37. Solarte J. A proposed data mining methodology and its application to industrial engineering; 2002.

  38. Guo P. Data science workflow: overview and challenges. Commun ACM. 2013.

  39. Kormann B, Altendorfer-Kaiser S. Influence of patterns and data-analytics on production logistics. In: Proceedings of the Hamburg international conference of logistics (HICL); 2017. p. 233–254.

  40. Espinosa JA, Armour F, The big data analytics gold rush: a research framework for coordination and governance. In, 49th Hawaii international conference on system sciences (HICSS). IEEE. 2016;2016:1112–21.

  41. Cios KJ, Kurgan LA. Trends in data mining and knowledge discovery. In: Advanced techniques in knowledge discovery and data mining, Springer: New York; 2005. p. 1–26.

  42. Gottgtroy P. Ontology driven knowledge discovery process: a proposal to integrate ontology engineering and KDD. In: PACIS 2007 Proceedings; 2007. p. 88.

  43. Yew SLB, Building use cases with activity reference framework for big data analytics. In, 9th international conference on IT in Asia (CITA). IEEE. 2015;2015:1–7.

  44. Grady NWKDD, meets big data. In, IEEE international conference on big data (Big Data). IEEE. 2016;2016:1603–8.

  45. do Nascimento GS, de Oliveira AA. An agile knowledge discovery in databases software process. In: International conference on data and knowledge engineering. Springer: New York; 2012. p. 56–64.

  46. Marr B. How the citizen data scientist will democratize big data. https://www.forbes.com/sites/bernardmarr/2016/04/01/how-the-citizen-data-scientist-will-democratize-big-data/76d2684365b8; 2016.

  47. Andreadis G, Fourtounis G, Bouzakis KD. Collaborative design in the era of cloud computing. Adv Eng Softw. 2015;81:66–72.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Rotondo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rotondo, A., Quilligan, F. Evolution Paths for Knowledge Discovery and Data Mining Process Models. SN COMPUT. SCI. 1, 109 (2020). https://doi.org/10.1007/s42979-020-0117-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-020-0117-6

Keywords

Navigation