Abstract
Given two sequences X and Y that are strings over some alphabet set, we consider the distance d(X, Y ) between them defined to be minimum number of character replacements and block (substring) reversals needed to transform X to Y (or vice versa). This is the “simplest” sequence comparison problem we know of that allows natural block edit operations. Block reversals arise naturally in genomic sequence comparison; they are also of interest in matching music data. We present an improved algorithm for exactly computing the distance d(X, Y ); it takes time O(X log2 X), and hence, is near-linear. Trivial approach takes quadratic time and the best known previous algorithm for this problem takes time ω(X log3 X).
Supported in part by a grant from Charles B. Wang foundation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Agarwal, K. Lin, H. Sawhney and K. Shim. Fast similarity search in the presence of noise, scaling and translation in time-series databases. Proc. 21st VLDB conf, 1995.
M. Gribskov and J. Devereux Sequence Analysis Primer, Stockton Press, 1991.
D. Harel and R. Tarjan. Fast Algorithms for Finding Nearest Common Ancestors. SIAM J. Comput., 13(2): 338–355, 1984.
M. Jackson, T. Strachan and G. Dover. Human Genome Evolution, Bios Scientific Publishers, 1996.
R. Karp, R. Miller and A. Rosenberg, Rapid Identification of Repeated Patterns in Strings, Trees, and Arrays, Proceedings of ACM Symposium on Theory of Computing, (1972).
D. Lopresti and A. Tomkins. Block Edit Models for Approximate String Matching. Theoretical Computer Science, 181(1): 159–179, 1997.
D. Sanko. and J. Kruskal, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley, Reading, Mass., 1983.
V. I. Levenshtein, Binary Codes Capable of Correcting Deletions, Insertions and Reversals, Cybernetics and Control Theory, 10(8):707–710, 1966.
S. Muthukrishnan and S. C. Sahinalp, Approximate Nearest Neighbors and Sequence Comparison with Block Operations, Proceedings of ACM Symposium on Theory of Computing, 2000.
P. Sellers, The Theory and Computation of Evolutionary Distances: Pattern Recognition. Journal of Algorithms, 1, (1980):359–373.
J. A. Storer, Data Compression, Methods and Theory. Computer Science Press, 1988.
W. F. Tichy, The String-to-String Correction Problem with Block Moves. ACM Trans. on Computer Systems, 2(4): 309–321, 1984.
J. Ziv and A. Lempel, A Universal Algorithm for Sequential Data Compression IEEE Trans. on Information Theory, 337–343, 1977.
P. Weiner Linear Pattern Matching Algorithms. Proc. IEEE Foundations of Computer Science (FOCS), 1–11, 1973.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Muthukrishnan, S., Sahinalp, S.C. (2002). An Improved Algorithm for Sequence Comparison with Block Reversals. In: Rajsbaum, S. (eds) LATIN 2002: Theoretical Informatics. LATIN 2002. Lecture Notes in Computer Science, vol 2286. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45995-2_30
Download citation
DOI: https://doi.org/10.1007/3-540-45995-2_30
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43400-9
Online ISBN: 978-3-540-45995-8
eBook Packages: Springer Book Archive