Sequential Clustering for Event Sequences and Its Impact on Next Process Step Prediction

Le, Mai; Nauck, Detlef; Gabrys, Bogdan; Martin, Trevor

doi:10.1007/978-3-319-08795-5_18

Mai Le¹⁶,
Detlef Nauck¹⁷,
Bogdan Gabrys¹⁶ &
…
Trevor Martin¹⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 442))

Included in the following conference series:

International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems

1030 Accesses
6 Citations

Abstract

Next step prediction is an important problem in process analytics and it can be used in process monitoring to preempt failure in business processes. We are using logfiles from a workflow system that record the sequential execution of business processes. Each process execution results in a timestamped event. The main issue of analysing such event sequences is that they can be very diverse. Models that can effectively handle diverse sequences without losing the sequential nature of the data are desired. We propose an approach which clusters event sequences. Each cluster consists of similar sequences and the challenge is to identify a similarity measure that can cope with the sequential nature of the data. After clustering we build individual predictive models for each group. This strategy addresses both the sequential and diverse characteristics of our data. We first employ K-means and extent it into a categorical-sequential clustering algorithm by combining it with sequential alignment. Finally, we treat each resulting cluster by building individual Markov models of different orders, expecting that the representative characteristics of each cluster are captured.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ruta, D., Majeed, B.: Business process forecasting in telecommunication industry. In: 2011 IEEE GCC Conference and Exhibition (GCC), pp. 389–392 (2011)
Google Scholar
Tsui, K., Chen, V., Jiang, W., Aslandogan, Y.: Data mining methods and applications. In: Pham, H. (ed.) Handbook of Engineering Statistics, pp. 651–669. Springer (2005)
Google Scholar
Trcka, N., Pechenizkiy, M.: From local patterns to global models: Towards domain driven educational process mining. In: 9th International Conference on Intelligent Systems Design and Applications, pp. 1114–1119 (2009)
Google Scholar
van der Aaslt, W., Weijters, A.: Process mining: Research agenda. Computers in Industry 53(3), 231–244 (2004)
Article Google Scholar
Smyth, P.: Clustering sequences with hidden markov models. In: Advances in Neural Information Processing Systems, pp. 648–654. MIT Press (1997)
Google Scholar
Garcia, D., Parrado, E., Diaz-de Maria, F.: A new distance measure for model-based sequence clustering. IEEE Transactions on Pattern Analysis and Machine Intelligent 1(7), 1325–1331 (2009)
Article Google Scholar
Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 443–453 (1970)
Article Google Scholar
Waterman, M.: Estimating statistical significance of sequence alignments. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences 344, 383–390 (1994)
Article Google Scholar
Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)
Article Google Scholar
Rajaraman, A., Ullman, J.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)
Book Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2001)
MATH Google Scholar
Berry, M., Linoff, G.: Data Mining Techniques: for Marketing, Sales, and Customer Relationship Management. Wiley, Newyork (2004)
Google Scholar
Gabrys, B., Bargiela, A.: General fuzzy min-max neural network for clustering and classification. IEEE Transactions on Neural Networks 11(3), 769–783 (2000)
Article Google Scholar
Anitha Elavarasi, S., Akilandeswari, J., Sathiyabhama, B.: A survey on partition clustering algorithms. International Journal of Enterprise Computing and Business Systems 1 (2011)
Google Scholar
Zaki, M., Peters, M., Assent, I., Seidl, T.: Clicks: An effective algorithm for mining subspace clusters in categorical datasets. Data Knowl. Eng. 60(1), 51–70 (2007)
Article Google Scholar
Dhillon, S., Modha, S.: Concept decompositions for large sparse text data using clustering. Machine Learning 42, 143–175 (2001)
Article MATH Google Scholar
Li, C., Biswas, G.: Clustering sequence data using hidden markov model representation. In: Proceedings of the SPIE 1999 Conference on Data Mining and Knowledge Discovery: Theory, pp. 14–21 (1999)
Google Scholar
Porikli, F.: Clustering variable length sequences by eigenvector decomposition using hmm. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds.) SSPR&SPR 2004. LNCS, vol. 3138, pp. 352–360. Springer, Heidelberg (2004)
Chapter Google Scholar
Kanungo, T., Mount, D., Netanyahu, N., Piatko, C., Silverman, R., Wu, A.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 881–892 (2002)
Article Google Scholar
Wagstaff, K., Cardie, C., Rogers, S., Schrodl, S.: Constrained k-means clustering with background knowledge. In: 18th International Conference on Machine Learning, pp. 577–584 (2001)
Google Scholar
Elkan, C.: Using the triangle ilequality to accelerate k-means. In: 20th International Conference on Machine Learning (ICML-2003), Washington DC, pp. 2–9 (2003)
Google Scholar
Pham, D., Dimov, S., Nguyen, C.: Selection of k in k-means clustering. I MECH E Part C Journal of Mechanical Engineering Science 219(1), 103–119 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Bournemouth, UK
Mai Le & Bogdan Gabrys
British Telecommunications, UK
Detlef Nauck
University of Bristol, UK
Trevor Martin

Authors

Mai Le
View author publications
You can also search for this author in PubMed Google Scholar
Detlef Nauck
View author publications
You can also search for this author in PubMed Google Scholar
Bogdan Gabrys
View author publications
You can also search for this author in PubMed Google Scholar
Trevor Martin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University Montpellier 2, LIRMM - CNRS UMR 5506, 161, Rue Ada, 34392, Montpellier Cedex 5, France
Anne Laurent
LIRMM, UMR CNRS/Universite Montpellier II, 161 rue Ada, 34392, Montpellier cedex 5, France
Olivier Strauss
LIP6, UPMC Univ. Paris 06, CNRS UMR 7606, F-75005, Paris, France
Bernadette Bouchon-Meunier
Dept. of Information Systems, Iona College, 710 North Ave, 10801, New Rochelle, NY, USA
Ronald R. Yager

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Le, M., Nauck, D., Gabrys, B., Martin, T. (2014). Sequential Clustering for Event Sequences and Its Impact on Next Process Step Prediction. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. IPMU 2014. Communications in Computer and Information Science, vol 442. Springer, Cham. https://doi.org/10.1007/978-3-319-08795-5_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-08795-5_18
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08794-8
Online ISBN: 978-3-319-08795-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics