Efficient Subsequence Matching Using the Longest Common Subsequence with a Dual Match Index

Han, Tae Sik; Ko, Seung-Kyu; Kang, Jaewoo

doi:10.1007/978-3-540-73499-4_44

Tae Sik Han¹,
Seung-Kyu Ko¹ &
Jaewoo Kang²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4571))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

3773 Accesses
19 Citations

Abstract

The purpose of subsequence matching is to find a query sequence from a long data sequence. Due to the abundance of applications, many solutions have been proposed. Virtually all previous solutions use the Euclidean measure as the basis for measuring distance between sequences. Recent studies, however, suggest that the Euclidean distance often fails to produce proper results due to the irregularity in the data, which is not so uncommon in our problem domain. Addressing this problem, some non-Euclidean measures, such as Dynamic Time Warping (DTW) and Longest Common Subsequence (LCS), have been proposed. However, most of the previous work in this direction focused on the whole sequence matching problem where query and data sequences are the same length. In this paper, we propose a novel subsequence matching framework using a non-Euclidean measure, in particular, LCS, and a new index query scheme. The proposed framework is based on the Dual Match framework where data sequences are divided into a series of disjoint equi-length subsequences and then indexed in an R-tree. We introduced similarity bound for index matching with LCS. The proposed query matching scheme reduces significant numbers of false positives in the match result. Furthermore, we developed an algorithm to skip expensive LCS computations through observing the warping paths. We validated our framework through extensive experiments using 48 different time series datasets. The results of the experiments suggest that our approach significantly improves the subsequence matching performance in various metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient similarity search in sequence databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993)
Google Scholar
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: Proceedings 1994 ACM SIGMOD Conference, Mineapolis, MN, ACM Press, New York (1994)
Google Scholar
Gunopoulos, D.: Discovering similar multidimensional trajectories. In: ICDE 2002. Proceedings of the 18th International Conference on Data Engineering, p. 673. IEEE Computer Society Press, Los Alamitos (2002)
Google Scholar
Kadous, M.: Grasp: Recognition of australian sign language using instrumented gloves (1995)
Google Scholar
Keogh, E.J.: Exact indexing of dynamic time warping. In: VLDB, pp. 406–417 (2002)
Google Scholar
Moon, Y.-S., Whang, K.-Y., Loh, W.-K.: Duality-based subsequence matching in time-series databases. In: Proceedings of the 17th ICDE, Washington, DC, pp. 263–272. IEEE Computer Society Press, Los Alamitos (2001)
Google Scholar
Moon, Y.-S., Whang, K.-Y., Loh, W.-K.: Efficient time-series subsequence matching using duality in constructing window. Information Systems 26(4), 279–293 (2001)
Article MATH Google Scholar
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition, pp. 159–165 (1990)
Google Scholar
Sankoff, D., Kruskal, J.: Time warps, string edits, and macromolecules: the theory and practice of sequence comparison. Addison-Wesley, Reading (1983)
Google Scholar
Vlachos, M., Hadjieleftheriou, M., Gunopulos, D., Keogh, E.: Indexing multi-dimensional time-series with support for multiple distance measures. In: KDD 2003, pp. 216–225. ACM Press, New York (2003)
Chapter Google Scholar
Zhu, Y., Shasha, D.: Warping indexes with envelope transforms for query by humming. In: SIGMOD 2003, pp. 181–192. ACM Press, New York (2003)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, North Carolina State University, Raleigh, NC 27569, USA
Tae Sik Han & Seung-Kyu Ko
Dept. of Computer Science and Engineering, Korea University, Seoul 136-705, Korea
Jaewoo Kang

Authors

Tae Sik Han
View author publications
You can also search for this author in PubMed Google Scholar
Seung-Kyu Ko
View author publications
You can also search for this author in PubMed Google Scholar
Jaewoo Kang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, T.S., Ko, SK., Kang, J. (2007). Efficient Subsequence Matching Using the Longest Common Subsequence with a Dual Match Index. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2007. Lecture Notes in Computer Science(), vol 4571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73499-4_44

Download citation

DOI: https://doi.org/10.1007/978-3-540-73499-4_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73498-7
Online ISBN: 978-3-540-73499-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics