Skip to main content

On the k-Closest Substring and k-Consensus Pattern Problems

  • Conference paper
Combinatorial Pattern Matching (CPM 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3109))

Included in the following conference series:

Abstract

Given a set S ={s 1,s 2,...,s n } of strings each of length m, and an integer L, we study the following two problems.

k -Closest Substring problem: find k center strings c 1,c 2 ,...,c k of length L minimizing d such that for each s j S, there is a length-L substring t j (closest substring) of s j with min 1 ≤ i ≤ k d(c i ,t j ) ≤ d. We give a PTAS for this problem, for k=O(1).

k -Consensus Pattern problem: find k median strings c 1,c 2 ,...,c k of length L and a substring t j (consensus pattern) of length L from each s j minimizing the total cost \(w= \sum \limits_{j=1}^n \min \limits _{1 \le i \le k} d(c_i,t_j)\). We give a PTAS for this problem, for k=O(1).

Our results improve recent results of [10] and [16] both of which depended on the random linear transformation technique in [16]. As for general k case, we give an alternative and direct proof of the NP-hardness of (2-ε)-approximation of the Hamming radius k -clustering problem, a special case of the k -Closest Substring problem restricted to L=m.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blum, T., Jiang, M., Li, J.: Linear Approximation of Shortest Superstrings. Journal of the ACM 41(4), 630–647 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  2. Buhler, J.: Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics 17, 419–428 (2001)

    Article  Google Scholar 

  3. Feder, T., Greene, D.H.: Optimal algorithms for approximate clustering. In: Proceedings of the 20th Annual ACM Symposium on Theory of Computing, pp. 434–444 (1988)

    Google Scholar 

  4. Fowler, R.J., Paterson, M.S., Tanimoto, S.L.: Optimal packing and covering in the plane are NP-complete. Inf. Proc. Letters 12(3), 133–137 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  5. Garey, M.R., Johnson, D.S.: The rectilinear Steiner tree problem is NPcomplete. SIAM J. Appl. Math. 32, 826–834 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  6. Gasieniec, L., Jansson, J., Lingas, A.: Efficient approximation algorithms for the Hamming center problem. In: Proc. 10th ACM-SIAM Symp. on Discrete Algorithms (1999),pp.S905–S906 (1999)

    Google Scholar 

  7. Gasieniec, L., Jansson, J., Lingas, A.: Approximation Algorithms for Hamming Clustering Problems. In: Proc. 11th Symp. CPM (2000), pp. 108–118 (2000)

    Google Scholar 

  8. Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theoretical Computer Science 38, 293–306 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  9. Indyk, P., Motwani, R.: Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In: Proc. 30th Annual ACM Symp. Theory Comput., pp. 604–613 (1998)

    Google Scholar 

  10. Jansson, J.: Consensus Algorithms for Trees and Strings. Doctoral dissertation (2003)

    Google Scholar 

  11. Lanctot, K., Li, M., Ma, B., Wang, L., Zhang, L.: Distinguishing string selection problems. In: Proc. 10th ACM-SIAM Symp. Discrete Algorithms, pp. 633–642 (1999)

    Google Scholar 

  12. Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. In: Proceedings of the 31st Annual ACM Symposium on Theory of Computing (1999), pp. 473–482 (1999)

    Google Scholar 

  13. Li, M., Ma, B., Wang, L.: On the Closest String and Substring Problems. Journal of ACM 49(2), 157–171 (2002)

    Article  MathSciNet  Google Scholar 

  14. Ma, B.: A polynomial time approximation scheme for the Closest Substring problem. In: Proc. 11th Annual Symposium on Combinatorial Pattern Matching, pp. 99–107 (2000)

    Google Scholar 

  15. Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)

    MATH  Google Scholar 

  16. Ostrovsky, R., Rabani, Y.: Polynomial-Time Approximation Schemes for Geometric Min-Sum Median Clustering. Journal of ACM 49(2), 139–156 (2002)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jiao, Y., Xu, J., Li, M. (2004). On the k-Closest Substring and k-Consensus Pattern Problems. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusoz, U. (eds) Combinatorial Pattern Matching. CPM 2004. Lecture Notes in Computer Science, vol 3109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27801-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27801-6_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22341-2

  • Online ISBN: 978-3-540-27801-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics