Skip to main content

Approximate Matching in Weighted Sequences

  • Conference paper
Combinatorial Pattern Matching (CPM 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4009))

Included in the following conference series:

Abstract

Weighted sequences have been recently introduced as a tool to handle a set of sequences that are not identical but have many local similarities. The weighted sequence is a “statistical image” of this set, where the probability of every symbol’s occurrence at every text location is given.

We address the problem of approximately matching a pattern in such a weighted sequence. The pattern is a given string and we seek all locations in the set where the pattern occurs with a high enough probability. We define the notion of Hamming distance and edit distance in weighted sequences and give efficient algorithms for computing them. We compute two versions of the Hamming distance in time \(O(n \sqrt{m\log m})\), where n is the length of the weighted text and m is the pattern length. The edit distance is computed in time O(nm) and O(nm 2), depending on the edit distance definition used. Unfortunately, due to space considerations, the edit distance details are left to the journal version.

We also define the notion of weighted matching in infinite alphabets and show that exact weighted matching can be computed in time O(slog2 s), where s is the number of text symbols having non-zero probability. The weighted Hamming distance over infinite alphabets can be computed in time \(\min(O(kn\sqrt{s}+s^{3/2}\log^2s), O(s^{4/3}m^{1/3}\log s))\).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amir, A., Farach, M.: Efficient 2-dimensional approximate matching of half-rectangular figures. Information and Computation 118(1), 1–11 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  2. Amir, A., Lewenstein, N., Lewenstein, M.: Pattern matching in hypertext. J. of Algorithms 35, 82–99 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  3. Christodoulakis, M., Iliopoulos, C.S., Mouchard, L., Tsichlas, K.: Pattern matchnig on weighted sequences. In: Proceedings of the Algorithms and Computational Methods for Biochemical and Evolutionary Networks (CompBioNets). KCL Publications (2004)

    Google Scholar 

  4. Cole, R., Hariharan, R.: Tree pattern matching and subset matching in randomized o(n log3 m) time. In: Proc. 29th ACM STOC, pp. 66–75 (1997)

    Google Scholar 

  5. Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Proc. 34st Annual Symposium on the Theory of Computing (STOC), pp. 592–601 (2002)

    Google Scholar 

  6. Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. MIT Press and McGraw-Hill (1992)

    Google Scholar 

  7. Dubiner, M., Galil, Z., Magen, E.: Faster tree pattern matching. J. ACM 41(2), 205–213 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  8. Fischer, M.J., Paterson, M.S.: String matching and other products. In: Karp, R.M. (ed.) Complexity of Computation, SIAM-AMS Proceedings, vol. 7, pp. 113–125 (1974)

    Google Scholar 

  9. Iliopoulos, C.S., Mouchard, L., Perdikuri, K., Tsakalidis, A.: Computing the repetitions in a weighted sequence. In: Proceeding of the Prague Stringology Conference, pp. 91–98 (2003)

    Google Scholar 

  10. Iliopoulos, C.S., Perdikuri, K., Theodoridis, E., Tsakalidis, A., Tsichlas, K.: Motif extraction from weighted sequences. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 286–297. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  11. Iliopoulos, C.S., Makris, C., Panagis, I., Perdikuri, K., Theodoridis, E., Tsakalidis, A.: Computing the repetiotions in a weighted sequence using weighted suffix trees. In: European Conference on Computational Biology (ECCB), pp. 539–540 (2003)

    Google Scholar 

  12. Kosaraju, S.R.: Efficient tree pattern matching. In: Proc. 30th IEEE FOCS, pp. 178–183 (1989)

    Google Scholar 

  13. Levenshtein, V.I.: Binary codes capable of correcting, deletions, insertions and reversals. Soviet Phys. Dokl. 10, 707–710 (1966)

    MathSciNet  Google Scholar 

  14. Thompson, J.D., Higgins, D.G., Gibson, T.J.: Clustal w: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4680 (1994)

    Article  Google Scholar 

  15. Venter, J.C.: Celera Genomics Corporation. The sequence of the human genome. Science 291, 1304–1351 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Amir, A., Iliopoulos, C., Kapah, O., Porat, E. (2006). Approximate Matching in Weighted Sequences. In: Lewenstein, M., Valiente, G. (eds) Combinatorial Pattern Matching. CPM 2006. Lecture Notes in Computer Science, vol 4009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780441_33

Download citation

  • DOI: https://doi.org/10.1007/11780441_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35455-0

  • Online ISBN: 978-3-540-35461-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics