Abstract
The paper deals with the search and analysis of the subsequences in large volume sequences (texts, DNA sequences, etc.). A new algorithm ProMFS for mining frequent sequences is proposed and investigated. It is based on the estimated probabilistic-statistical characteristics of the appearance of elements of the sequence and their order. The algorithm builds a new much shorter sequence and makes decisions on the main sequence in accordance with the results of analysis of the shorter one.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R.C., Agrawal, C.C., Prasad, V.V. (2000) Depth first generation of long patterns. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, Massachusetts 108–118
http://en.wikipedia.org/wiki/DNA_sequence
Zaki, M.J. (2001) SPADE: An efficient algorithm for mining frequent sequences. Machine Learning Journal. (Fisher, D. (ed.): Special issue on Unsupervised Learning). 42(1/2) 31–60
Zaki, M.J. (2000) Parallel sequence mining on shared-memory machines. In: Zaki, M.J., Ching-Tien Ho (eds): Large-scale Parallel Data Mining. Lecture Notes in Artificial Intelligence, Vol. 1759. Springer-Verlag, Berlin Heidelberg, New York 161–189
Pei, P.J., Han, J., Wang, W. (2002) Mining Sequential Patterns with Constraints in Large Databases. In Proceedings of the 11th ACM International Conference on Information and Knowledge Management (CIKM’02). McLean, VA 18–25
Pinto, P., Han, J., Pei, J., Wang, K., Chen, Q., Dayal, U. (2001) Multi-Dimensional Sequential Pattern Mining. In Proceedings of the 10th ACM International Conference on Information and Knowledge Management (CIKM’01). Atlanta, Georgia, 81–88
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C. (2001) PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. 17th International Conference on Data Engineering ICDE2001. Heidelberg, 215–224
Han, J., Pei, J. (2000) FreeSpan: Frequent pattern-projected sequential pattern mining. In Proc. Knowledge Discovery and Data Mining. 355–359
Ayres, J., Flannick, J., Gehrke, J., Yiu, T. (2002) Sequential pattern mining using a bitmap representation. In Proc. Knowledge Discovery and Data Mining. 429–435
Kum, H.C., Pei, J., Wang, W. (2003) ApproxMAP: Approximate Mining of Consensus Sequential Patterns. In Proceedings of the 2003 SIAM International Conference on Data Mining (SIAM DM’ 03). San Francisco, CA, 311–315
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tumasonis, R., Dzemyda, G. (2005). Analysis of the Statistical Characteristics in Mining of Frequent Sequences. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol 31. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32392-9_39
Download citation
DOI: https://doi.org/10.1007/3-540-32392-9_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25056-2
Online ISBN: 978-3-540-32392-1
eBook Packages: EngineeringEngineering (R0)