Skip to main content

Sequential Clustering for Event Sequences and Its Impact on Next Process Step Prediction

  • Conference paper
Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2014)

Abstract

Next step prediction is an important problem in process analytics and it can be used in process monitoring to preempt failure in business processes. We are using logfiles from a workflow system that record the sequential execution of business processes. Each process execution results in a timestamped event. The main issue of analysing such event sequences is that they can be very diverse. Models that can effectively handle diverse sequences without losing the sequential nature of the data are desired. We propose an approach which clusters event sequences. Each cluster consists of similar sequences and the challenge is to identify a similarity measure that can cope with the sequential nature of the data. After clustering we build individual predictive models for each group. This strategy addresses both the sequential and diverse characteristics of our data. We first employ K-means and extent it into a categorical-sequential clustering algorithm by combining it with sequential alignment. Finally, we treat each resulting cluster by building individual Markov models of different orders, expecting that the representative characteristics of each cluster are captured.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ruta, D., Majeed, B.: Business process forecasting in telecommunication industry. In: 2011 IEEE GCC Conference and Exhibition (GCC), pp. 389–392 (2011)

    Google Scholar 

  2. Tsui, K., Chen, V., Jiang, W., Aslandogan, Y.: Data mining methods and applications. In: Pham, H. (ed.) Handbook of Engineering Statistics, pp. 651–669. Springer (2005)

    Google Scholar 

  3. Trcka, N., Pechenizkiy, M.: From local patterns to global models: Towards domain driven educational process mining. In: 9th International Conference on Intelligent Systems Design and Applications, pp. 1114–1119 (2009)

    Google Scholar 

  4. van der Aaslt, W., Weijters, A.: Process mining: Research agenda. Computers in Industry 53(3), 231–244 (2004)

    Article  Google Scholar 

  5. Smyth, P.: Clustering sequences with hidden markov models. In: Advances in Neural Information Processing Systems, pp. 648–654. MIT Press (1997)

    Google Scholar 

  6. Garcia, D., Parrado, E., Diaz-de Maria, F.: A new distance measure for model-based sequence clustering. IEEE Transactions on Pattern Analysis and Machine Intelligent 1(7), 1325–1331 (2009)

    Article  Google Scholar 

  7. Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 443–453 (1970)

    Article  Google Scholar 

  8. Waterman, M.: Estimating statistical significance of sequence alignments. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences 344, 383–390 (1994)

    Article  Google Scholar 

  9. Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)

    Article  Google Scholar 

  10. Rajaraman, A., Ullman, J.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)

    Book  Google Scholar 

  11. Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2001)

    MATH  Google Scholar 

  12. Berry, M., Linoff, G.: Data Mining Techniques: for Marketing, Sales, and Customer Relationship Management. Wiley, Newyork (2004)

    Google Scholar 

  13. Gabrys, B., Bargiela, A.: General fuzzy min-max neural network for clustering and classification. IEEE Transactions on Neural Networks 11(3), 769–783 (2000)

    Article  Google Scholar 

  14. Anitha Elavarasi, S., Akilandeswari, J., Sathiyabhama, B.: A survey on partition clustering algorithms. International Journal of Enterprise Computing and Business Systems 1 (2011)

    Google Scholar 

  15. Zaki, M., Peters, M., Assent, I., Seidl, T.: Clicks: An effective algorithm for mining subspace clusters in categorical datasets. Data Knowl. Eng. 60(1), 51–70 (2007)

    Article  Google Scholar 

  16. Dhillon, S., Modha, S.: Concept decompositions for large sparse text data using clustering. Machine Learning 42, 143–175 (2001)

    Article  MATH  Google Scholar 

  17. Li, C., Biswas, G.: Clustering sequence data using hidden markov model representation. In: Proceedings of the SPIE 1999 Conference on Data Mining and Knowledge Discovery: Theory, pp. 14–21 (1999)

    Google Scholar 

  18. Porikli, F.: Clustering variable length sequences by eigenvector decomposition using hmm. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds.) SSPR&SPR 2004. LNCS, vol. 3138, pp. 352–360. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  19. Kanungo, T., Mount, D., Netanyahu, N., Piatko, C., Silverman, R., Wu, A.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 881–892 (2002)

    Article  Google Scholar 

  20. Wagstaff, K., Cardie, C., Rogers, S., Schrodl, S.: Constrained k-means clustering with background knowledge. In: 18th International Conference on Machine Learning, pp. 577–584 (2001)

    Google Scholar 

  21. Elkan, C.: Using the triangle ilequality to accelerate k-means. In: 20th International Conference on Machine Learning (ICML-2003), Washington DC, pp. 2–9 (2003)

    Google Scholar 

  22. Pham, D., Dimov, S., Nguyen, C.: Selection of k in k-means clustering. I MECH E Part C Journal of Mechanical Engineering Science 219(1), 103–119 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Le, M., Nauck, D., Gabrys, B., Martin, T. (2014). Sequential Clustering for Event Sequences and Its Impact on Next Process Step Prediction. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. IPMU 2014. Communications in Computer and Information Science, vol 442. Springer, Cham. https://doi.org/10.1007/978-3-319-08795-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08795-5_18

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08794-8

  • Online ISBN: 978-3-319-08795-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics