Abstract
Process mining techniques have been developed in the ambit of business process management to extract information from event logs consisting of activities and then produce a graphical representation of the process control flow, detect relations between components involved in the process and infer data dependencies between process activities. These process characterisations allow the analyst to discover an annotated visual representation of the conceptual model or the performance model of the process, check conformance with an a priori model to detect deviations and extend the a priori model with quantitative information such as frequencies and performance data. However, a process model yielded by process mining techniques is more similar to a representation of the process behaviour rather than an actual model of the process: it often consists of a huge number of states and interconnections between them, thus resulting in a spaghetti-like net which is hard to interpret or even read. In this paper we propose a novel technique, which we call model mining, to derive an abstract but concise and functionally structured model from event logs. Such a model is not a representation of the unfolded behaviour, but comprises, instead, a set of formal rules for generating the system behaviour, thus supporting more powerful predictive capabilities. The set of rules can be either inferred directly from the events logs (constructive mining) or refined by sifting a plausible a priori model using the event logs as a sieve until a reasonably concise model is achieved (refinement mining). We use rewriting logic as the formal framework in which to perform model mining and implement our framework using the Maude rewrite system. Once the final formal model is attained, it can be used, within the same rewriting logic framework, to predict future evolutions of the behaviour through simulation, to carry out further validation or to analyse properties through model checking. Finally, we illustrate our approach on two case studies from two different application fields, ecology and collaborative learning.
Similar content being viewed by others
References
Basuki, T.A., Cerone, A., Barbuti, R., Maggiolo-Schettini, A., Milazzo, P., & Rossi, E. (2010). Modelling the dynamics of an Aedes albopictus population. In Proceedings of AMCA-POP 2010, electronic proceedings in theoretical computer science (Vol. 227, pp. 37–58).
Cerone, A. (2012). Learning and activity patterns in OSS communities and their impact on software quality. In Proceedings of opencert 2011, ECEASST (Vol. 48).
Cerone, A. (2015). Process mining as a modelling tool: Beyond the domain of business process management. In SEFM 2015 Collocated workshops, lecture notes in computer science (Vol. 9509, pp. 139–144). Springer.
Cerone, A. (2016a). A cognitive framework based on rewriting logic for the analysis of interactive systems. In Proceedings of SEFM 2016, lecture notes in computer science (Vol. 9763, pp. 287–303). Springer.
Cerone, A. (2016b). Refinement mining: Using data to sift plausible models. In Proceedings of SEFM 2016 collocated workshops, lecture notes in computer science (Vol. 9946, pp. 26–41). Springer.
Češka, M., Dannenberg, F., Kwiatkowska, M., & Paoletti, N. (2014). Precise parameter synthesis for stochastic biochemical systems. In Proceedings of CMSB 2014, lecture notes in computer science (Vol. 8859, pp. 86–98). Springer.
Clavel, M., Durán, F., Eker, S., Lincoln, P., Martí-oliet, N., Meseguer, J., & Talcott, C. (2003). The Maude 2.0 System. In Nieuwenhuis, R. (Ed.) Rewriting techniques and applications (RTA 2003), no. 2706 in lecture notes in computer science (pp. 76–87). Springer-Verlag.
Elliot, M.S., & Scacchi, W. (2003). Free software development: Cooperation and conflict in a virtual organizational culture. In Free/open source software development (pp. 152–173). Idea Publishing.
Gulwani, S. (2011). Automating string processing in spreadsheets using input-output examples. In Notices, A.S. (Ed.) Proceedings of POPL 2011 (Vol. 46, pp. 317–330). ACM.
Koksal, A.S., Pu, Y., Srivastava, S., Bodik, R., Fisher, J., & Piterman, N. (2013). Automating string processing in spreadsheets using input-output examples. In Notices, A.S. (Ed.) Proceedings of POPL 2013 (Vol. 48, pp. 469–482). ACM.
Lakhani, K.R., & Von Hippel, E. (2003). How open source software works: free user-to-user assistance. Research Policy, 32(6), 923–943.
Larson, B. (2012). Delivering business intelligence with Microsoft SQL server 2012. McGraw-Hill Osborne Media.
Martí-Oliet, N., & Meseguer, J. (2002). Rewriting logic: roadmap and bibliography. Theoretical Computer Science, 285(2), 121–154.
Mukala, P. (2015). Process models for learning patterns in FLOSS repositories. University of Pisa: Ph.D. thesis, Department of Computer Science .
Mukala, P., Cerone, A., & Turini, F. (2015a). An exploration of learning processes as process maps in FLOSS repositories. Tech. rep., University of Pisa.
Mukala, P., Cerone, A., & Turini, F. (2015b). Mining learning processes from FLOSS mailing archives. In Open and big data management and innovation, IFIP lecture notes in computer science (Vol. 9373, pp. 287–298). Springer.
Mukala, P., Cerone, A., & Turini, F. (2017). A conformance verification of a-priori learning models on free/libre open source software (FLOSS) mailing archives. Education and Information Technologies, In press.
Paoletti, N., Yordanov, B., Hamadi, Y., Wintersteiger, C.M., & Kugler, H. (2014). Analyzing and synthesizing genomic logic functions. In Proceedings of CAV 2014, lecture notes in computer science (Vol. 8559, pp. 343–357). Springer.
Rozinat, A., & van der Aalst, W.M.P. (2008). Conformance checking of processes based on monitoring real behavior. Information Systems, 33(1), 64–95.
Shams, F., Cerone, A., & De Nicola, R. (2015). On integrating social and sensor networks for emergency management. In SEFM 2015 Collocated workshops, lecture notes in computer science (Vol. 9509, pp. 145–160). Springer.
Singh, V., Nichols, D.M., & Twidale, M.B. (2009). Users of open source software: How do they get help?. In Proceedings of the 42nd Hawaii international conference on system science (pp. 1–10). IEEE comp. Soc.
Solar-Lezama, A., Rabbah, R.M., Bodik, R., & Ebcioglu, K. (2005). Programming by sketching for bit-streaming programs. In Proceedings of PLDI 2005, ACM SIGPLAN notices (Vol. 40, pp. 281–294). ACM.
Srivastava, S., Gulwani, S., & Foster, J.S. (2010). From program verification to program synthesis. In Notices, A.S. (Ed.) Proceedings of POPL 2010 (Vol. 45, pp. 313–326). ACM.
Steehouder, M.F. (2002). Beyond technical documentation: Users helping each other. In Proceedings of the professional communication conference (IPCC 2002) (pp. 489–499). IEEE comp. Soc.
van der Aalst, W.M.P., de Beer, H.T., & can Dongen, B.F. (2005). Process mining and verification of properties: An approach based on temporal logic Beta Working Paper Series WT (Vol. 136). Eindhoven: Eindhoven University of Technology.
van der Aalst, W.M.P., & Stahl, C. (2011). Modeling business processes: a Petri Net-Oriented approach. The MIT press.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cerone, A. Model mining. J Intell Inf Syst 52, 501–532 (2019). https://doi.org/10.1007/s10844-017-0474-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-017-0474-3