Analysis of the Statistical Characteristics in Mining of Frequent Sequences

Tumasonis, Romanas; Dzemyda, Gintautas

doi:10.1007/3-540-32392-9_39

Romanas Tumasonis³ &
Gintautas Dzemyda³

Part of the book series: Advances in Soft Computing ((AINSC,volume 31))

849 Accesses

Abstract

The paper deals with the search and analysis of the subsequences in large volume sequences (texts, DNA sequences, etc.). A new algorithm ProMFS for mining frequent sequences is proposed and investigated. It is based on the estimated probabilistic-statistical characteristics of the appearance of elements of the sequence and their order. The algorithm builds a new much shorter sequence and makes decisions on the main sequence in accordance with the results of analysis of the shorter one.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R.C., Agrawal, C.C., Prasad, V.V. (2000) Depth first generation of long patterns. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, Massachusetts 108–118
Google Scholar
http://en.wikipedia.org/wiki/DNA_sequence
Google Scholar
Zaki, M.J. (2001) SPADE: An efficient algorithm for mining frequent sequences. Machine Learning Journal. (Fisher, D. (ed.): Special issue on Unsupervised Learning). 42(1/2) 31–60
Google Scholar
Zaki, M.J. (2000) Parallel sequence mining on shared-memory machines. In: Zaki, M.J., Ching-Tien Ho (eds): Large-scale Parallel Data Mining. Lecture Notes in Artificial Intelligence, Vol. 1759. Springer-Verlag, Berlin Heidelberg, New York 161–189
Google Scholar
Pei, P.J., Han, J., Wang, W. (2002) Mining Sequential Patterns with Constraints in Large Databases. In Proceedings of the 11th ACM International Conference on Information and Knowledge Management (CIKM’02). McLean, VA 18–25
Google Scholar
Pinto, P., Han, J., Pei, J., Wang, K., Chen, Q., Dayal, U. (2001) Multi-Dimensional Sequential Pattern Mining. In Proceedings of the 10th ACM International Conference on Information and Knowledge Management (CIKM’01). Atlanta, Georgia, 81–88
Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C. (2001) PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. 17th International Conference on Data Engineering ICDE2001. Heidelberg, 215–224
Google Scholar
Han, J., Pei, J. (2000) FreeSpan: Frequent pattern-projected sequential pattern mining. In Proc. Knowledge Discovery and Data Mining. 355–359
Google Scholar
Ayres, J., Flannick, J., Gehrke, J., Yiu, T. (2002) Sequential pattern mining using a bitmap representation. In Proc. Knowledge Discovery and Data Mining. 429–435
Google Scholar
Kum, H.C., Pei, J., Wang, W. (2003) ApproxMAP: Approximate Mining of Consensus Sequential Patterns. In Proceedings of the 2003 SIAM International Conference on Data Mining (SIAM DM’ 03). San Francisco, CA, 311–315
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Mathematics and Informatics, Akademijos str. 4, 08663, Vilnius, Lithuania
Romanas Tumasonis & Gintautas Dzemyda

Authors

Romanas Tumasonis
View author publications
You can also search for this author in PubMed Google Scholar
Gintautas Dzemyda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Sciences, Polish Academy of Sciences, ul. Ordona 21, 01-237, Warszawa, Poland
Mieczysław A. Kłopotek , Sławomir T. Wierzchoń & Krzysztof Trojanowski , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tumasonis, R., Dzemyda, G. (2005). Analysis of the Statistical Characteristics in Mining of Frequent Sequences. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol 31. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32392-9_39

Download citation

DOI: https://doi.org/10.1007/3-540-32392-9_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25056-2
Online ISBN: 978-3-540-32392-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics