Alignment-Based Trace Clustering

Chatain, Thomas; Carmona, Josep; van Dongen, Boudewijn

doi:10.1007/978-3-319-69904-2_24

Thomas Chatain¹⁷,
Josep Carmona¹⁸ &
Boudewijn van Dongen¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10650))

Included in the following conference series:

International Conference on Conceptual Modeling

1752 Accesses
17 Citations

Abstract

A novel method to cluster event log traces is presented in this paper. In contrast to the approaches in the literature, the clustering approach of this paper assumes an additional input: a process model that describes the current process. The core idea of the algorithm is to use model traces as centroids of the clusters detected, computed from a generalization of the notion of alignment. This way, model explanations of observed behavior are the driving force to compute the clusters, instead of current model agnostic approaches, e.g., which group log traces merely on their vector-space similarity. We believe alignment-based trace clustering provides results more useful for stakeholders. Moreover, in case of log incompleteness, noisy logs or concept drift, they can be more robust for dealing with highly deviating traces. The technique of this paper can be combined with any clustering technique to provide model explanations to the clusters computed. The proposed technique relies on encoding the individual alignment problems into the (pseudo-)Boolean domain, and has been implemented in our tool DarkSider that uses an open-source solver.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Notes

1.
Operators ; and || denote sequential and parallel composition, respectively.
2.
We understand the \(\sum \) as a sum over a multiset, taking multiplicities into account. For instance, with the multiset \(A = \{1, 1\}\), we get \(\sum _{i \in A}i = 2\).
3.
More precisely, the problem of existence of a \(\delta \)-multi-alignment for given \(\mathcal {C}\), \(N\) and \(\delta \) (represented in unary), is NP-complete. For NP-hardness, we use a reduction from the problem of reachability of a marking \(m\) in a 1-safe acyclic Petri net \(N\), known to be NP-complete [12, 13], to the existence of a \(0\)-multi-alignment with the empty collection \(\mathcal C = \emptyset \).
4.
Pseudo-Boolean constraints are generalizations of Boolean constraints. They allow one to specify constant bounds on the number of variables which can/must be assigned to true among a set \(V\) of variables. We write them as \(a\,\le \,\sum _{v\,\in \,V}v\, \le \,b\). Pseudo-Boolean constraints are not more expressive but can be upto exponentially more concise than Boolean constraints. Some pseudo-Boolean solvers also offer to search for a solution minimizing a pseudo-Boolean objective of the same form \(\sum _{v\,\in \,V}v\): number of variables assigned to true among \(V\).
5.
This holds as well for Hamming or edit distance.
6.
For efficiency reasons, DarkSideruses currently an ad-hoc distance intermediate between Hamming and Levenshtein.
7.
If more flexible distance parameters are applied, a clustering with only 10 traces unclustered can be computed.

References

van der Aalst, W.M.P.: Process Mining — Discovery, Conformance and Enhancement of Business Processes. Springer, Berlin (2011)
MATH Google Scholar
Greco, G., Guzzo, A., Pontieri, L., Saccà, D.: Discovering expressive process models by clustering log traces. IEEE Trans. Knowl. Data Eng. 18(8), 1010–1027 (2006)
Article Google Scholar
Ferreira, D., Zacarias, M., Malheiros, M., Ferreira, P.: Approaching process mining with sequence clustering: experiments and findings. In: Alonso, G., Dadam, P., Rosemann, M. (eds.) BPM 2007. LNCS, vol. 4714, pp. 360–374. Springer, Heidelberg (2007). doi:10.1007/978-3-540-75183-0_26
Chapter Google Scholar
Song, M., Günther, C.W., van der Aalst, W.M.P.: Trace clustering in process mining. In: Ardagna, D., Mecella, M., Yang, J. (eds.) BPM 2008. LNBIP, vol. 17, pp. 109–120. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00328-8_11
Chapter Google Scholar
Bose, R., van der Aalst, W.M.P.: Context aware trace clustering: towards improving process mining results. In: Proceedings of the SIAM International Conference on Data Mining, SDM 2009, 30 April – 2 May 2009, Sparks, Nevada, USA, pp. 401–412 (2009)
Chapter Google Scholar
Bose, R.P.J.C., van der Aalst, W.M.P.: Trace clustering based on conserved patterns: towards achieving better process models. In: Rinderle-Ma, S., Sadiq, S., Leymann, F. (eds.) BPM 2009. LNBIP, vol. 43, pp. 170–181. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12186-9_16
Chapter Google Scholar
Weerdt, J.D., vanden Broucke, S.K.L.M., Vanthienen, J., Baesens, B.: Active trace clustering for improved process discovery. IEEE Trans. Knowl. Data Eng. 25(12), 2708–2720 (2013)
Article Google Scholar
Hompes, B., Buijs, J., van der Aalst, W., Dixit, P., Buurman, H.: Discovering deviating cases and process variants using trace clustering. In: Proceedings of the 27th Benelux Conference on Artificial Intelligence (BNAIC 2015), Hasselt, Belgium, 5–6 November 2015
Google Scholar
Dumas, M., van der Aalst, W.M.P., ter Hofstede, A.H.M.: Process-Aware Information Systems: Bridging People and Software Through Process Technology. Wiley, Hoboken (2005)
Book Google Scholar
Adriansyah, A.: Aligning observed and modeled behavior. Ph.D. thesis, Technische Universiteit Eindhoven (2014)
Google Scholar
Murata, T.: Petri nets: properties, analysis and applications. Proc. IEEE 77(4), 541–574 (1989)
Article Google Scholar
Stewart, I.A.: Reachability in some classes of acyclic Petri nets. Fundam. Inform. 23(1), 91–100 (1995)
MathSciNet MATH Google Scholar
Cheng, A., Esparza, J., Palsberg, J.: Complexity results for 1-safe nets. In: Shyamasundar, R.K. (ed.) FSTTCS 1993. LNCS, vol. 761, pp. 326–337. Springer, Heidelberg (1993). doi:10.1007/3-540-57529-4_66
Chapter Google Scholar
Eén, N., Sörensson, N.: Translating pseudo-boolean constraints into SAT. JSAT 2(1–4), 1–26 (2006)
MATH Google Scholar
Taymouri, F., Carmona, J.: Model and event log reductions to boost the computation of alignments. In: Proceedings of the 6th International Symposium on Data-driven Process Discovery and Analysis (SIMPDA 2016), Graz, Austria, 15–16 December 2016, pp. 50–62 (2016)
Google Scholar
Chatain, T., Carmona, J.: Anti-alignments in conformance checking — the dark side of process models. In: Kordon, F., Moldt, D. (eds.) PETRI NETS 2016. LNCS, vol. 9698, pp. 240–258. Springer, Cham (2016). doi:10.1007/978-3-319-39086-4_15
Chapter MATH Google Scholar
Taymouri, F., Carmona, J.: A recursive paradigm for aligning observed behavior of large structured process models. In: La Rosa, M., Loos, P., Pastor, O. (eds.) BPM 2016. LNCS, vol. 9850, pp. 197–214. Springer, Cham (2016). doi:10.1007/978-3-319-45348-4_12
Chapter Google Scholar

Download references

Acknowledgements

We thank Bart Hompes for facilitating the clustering results of his tool for the example used in the experiments. This work has been partially supported by funds from the Spanish Ministry for Economy and Competitiveness (MINECO), the European Union (FEDER funds) under grant COMMAS (ref. TIN2013-46181-C2-1-R).

Author information

Authors and Affiliations

LSV, ENS Paris-Saclay, CNRS, Inria, Cachan, France
Thomas Chatain
Universitat Politècnica de Catalunya, Barcelona, Spain
Josep Carmona
Eindhoven University of Technology, Eindhoven, The Netherlands
Boudewijn van Dongen

Authors

Thomas Chatain
View author publications
You can also search for this author in PubMed Google Scholar
Josep Carmona
View author publications
You can also search for this author in PubMed Google Scholar
Boudewijn van Dongen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Chatain .

Editor information

Editors and Affiliations

University of Klagenfurt, Klagenfurt, Austria
Heinrich C. Mayr
Free University of Bozen-Bolzano, Bozen-Bolzano, Italy
Giancarlo Guizzardi
Victoria University of Wellington, Wellington, New Zealand
Hui Ma
Valencia University of Technology, Valencia, Spain
Oscar Pastor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chatain, T., Carmona, J., van Dongen, B. (2017). Alignment-Based Trace Clustering. In: Mayr, H., Guizzardi, G., Ma, H., Pastor, O. (eds) Conceptual Modeling. ER 2017. Lecture Notes in Computer Science(), vol 10650. Springer, Cham. https://doi.org/10.1007/978-3-319-69904-2_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-69904-2_24
Published: 21 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69903-5
Online ISBN: 978-3-319-69904-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics