Skip to main content

Optimal In-Place Suffix Sorting

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11147))

Included in the following conference series:

Abstract

The suffix array is a fundamental data structure for many applications that involve string searching and data compression. Designing time/space-efficient suffix array construction algorithms has attracted significant attentions and considerable advances have been made for the past 20 years. We obtain the first in-place linear time suffix array construction algorithms that are optimal both in time and space for (read-only) integer alphabets. Our algorithm settles the open problem posed by Franceschini and Muthukrishnan in ICALP 2007. The open problem asked to design in-place algorithms in \(o(n\log n)\) time and ultimately, in O(n) time for (read-only) integer alphabets with \(|\varSigma | \le n\). Our result is in fact slightly stronger since we allow \(|\varSigma |=O(n)\). Besides, we provide an optimal in-place \(O(n\log n)\) time suffix sorting algorithm for read-only general alphabets (i.e., only comparisons are allowed), recovering the result obtained by Franceschini and Muthukrishnan which was an open problem posed by Manzini and Ferragina in ESA 2002.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Some previous algorithms state the space usages in terms of bits. We convert them into words.

  2. 2.

    The definitions of bucket array and type array can be found in Sect. 2.

  3. 3.

    Some previous papers use $ to denote the sentinel.

  4. 4.

    If one worries the \(O(\log n)\) workspace in the recursion, one can use the highest bits in \(\mathsf {SA}\) (i.e., n bits) to store them since the size of the reduced sub-problem is no larger than n/2.

  5. 5.

    We use at most five special symbols in this paper. The special symbol is only used to simplify the argument and we do not have to impose any additional assumption to accommodate these symbols (including the read-only general alphabets case). These special symbols can be handled using an extra O(1) workspace. The details can be found in our full version [24].

References

  1. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: The enhanced suffix array and its applications to genome analysis. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 449–463. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45784-4_35

    Chapter  Google Scholar 

  2. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discret. Algorithms 2(1), 53–86 (2004)

    Article  MathSciNet  Google Scholar 

  3. Baron, D., Bresler, Y.: Antisequential suffix sorting for bwt-based data compression. IEEE Trans. Comput. 54(4), 385–397 (2005)

    Article  Google Scholar 

  4. Burkhardt, S., Kärkkäinen, J.: Fast lightweight suffix array construction and checking. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 55–69. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44888-8_5

    Chapter  Google Scholar 

  5. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical report 124 (1994)

    Google Scholar 

  6. Clark, D.: Compact pat trees. Ph.D. thesis, University of Waterloo (1996)

    Google Scholar 

  7. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2001)

    Google Scholar 

  8. Dhaliwal, J., Puglisi, S.J., Turpin, A.: Trends in suffix sorting: a survey of low memory algorithms. In: Proceedings of the Thirty-fifth Australasian Computer Science Conference, vol. 122, pp. 91–98. Australian Computer Society, Inc. (2012)

    Google Scholar 

  9. Farach, M.: Optimal suffix tree construction with large alphabets. In: Proceedings of the 38th Annual Symposium on Foundations of Computer Science (FOCS), pp. 137–143. IEEE (1997)

    Google Scholar 

  10. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS), pp. 390–398. IEEE (2000)

    Google Scholar 

  11. Franceschini, G., Muthukrishnan, S.: In-place suffix sorting. In: Arge, L., Cachin, C., Jurdziński, T., Tarlecki, A. (eds.) ICALP 2007. LNCS, vol. 4596, pp. 533–545. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73420-8_47

    Chapter  Google Scholar 

  12. Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)

    Article  MathSciNet  Google Scholar 

  13. Hon, W.K., Sadakane, K., Sung, W.K.: Breaking a time-and-space barrier in constructing full-text indices. In: Proceedings of the 44th Annual Symposium on Foundations of Computer Science (FOCS), pp. 251–260. IEEE (2003)

    Google Scholar 

  14. Huo, H., Chen, L., Vitter, J.S., Nekrich, Y.: A practical implementation of compressed suffix arrays with applications to self-indexing. In: Data Compression Conference (DCC), pp. 292–301. IEEE (2014)

    Google Scholar 

  15. Huo, H., et al.: CS2A: a compressed suffix array-based method for short read alignment. In: Data Compression Conference (DCC), pp. 271–278. IEEE (2016)

    Google Scholar 

  16. Itoh, H., Tanaka, H.: An efficient method for in memory construction of suffix arrays. In: String Processing and Information Retrieval Symposium, 1999 and International Workshop on Groupware, pp. 81–88. IEEE (1999)

    Google Scholar 

  17. Jacobson, G.: Space-efficient static trees and graphs. In: Proceedings of the 30th Annual Symposium on Foundations of Computer Science (FOCS), pp. 549–554. IEEE (1989)

    Google Scholar 

  18. Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45061-0_73

    Chapter  Google Scholar 

  19. Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM (JACM) 53(6), 918–936 (2006)

    Article  MathSciNet  Google Scholar 

  20. Kim, D.K., Jo, J., Park, H.: A fast algorithm for constructing suffix arrays for fixed-size alphabets. In: Ribeiro, C.C., Martins, S.L. (eds.) WEA 2004. LNCS, vol. 3059, pp. 301–314. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24838-5_23

    Chapter  Google Scholar 

  21. Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 186–199. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44888-8_14

    Chapter  Google Scholar 

  22. Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 200–210. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44888-8_15

    Chapter  Google Scholar 

  23. Larsson, N.J., Sadakane, K.: Faster suffix sorting. Theor. Comput. Sci. 387(3), 258–272 (2007)

    Article  MathSciNet  Google Scholar 

  24. Li, Z., Li, J., Huo, H.: Optimal in-place suffix sorting. arXiv preprint arXiv:1610.08305 (2016)

  25. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. In: Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 319–327. Society for Industrial and Applied Mathematics (1990)

    Google Scholar 

  26. Maniscalco, M.A., Puglisi, S.J.: Faster lightweight suffix array construction. In: Proceedings of International Workshop On Combinatorial Algorithms (IWOCA), pp. 16–29. Citeseer (2006)

    Google Scholar 

  27. Maniscalco, M.A., Puglisi, S.J.: An efficient, versatile approach to suffix sorting. J. Exp. Algorithmics (JEA) 12, 1–2 (2008)

    Article  MathSciNet  Google Scholar 

  28. Manzini, G., Ferragina, P.: Engineering a lightweight suffix array construction algorithm. In: Möhring, R., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 698–710. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45749-6_61

    Chapter  Google Scholar 

  29. McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM (JACM) 23(2), 262–272 (1976)

    Article  MathSciNet  Google Scholar 

  30. Navarro, G., Providel, E.: Fast, small, simple rank/select on bitmaps. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276, pp. 295–306. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30850-5_26

    Chapter  Google Scholar 

  31. Nong, G.: Practical linear-time O (1)-workspace suffix sorting for constant alphabets. ACM Trans. Inf. Syst. (TOIS) 31(3), 15 (2013)

    Article  MathSciNet  Google Scholar 

  32. Nong, G., Zhang, S.: Optimal lightweight construction of suffix arrays for constant alphabets. In: Dehne, F., Sack, J.-R., Zeh, N. (eds.) WADS 2007. LNCS, vol. 4619, pp. 613–624. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73951-7_53

    Chapter  Google Scholar 

  33. Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: Data Compression Conference (DCC), pp. 193–202. IEEE (2009)

    Google Scholar 

  34. Nong, G., Zhang, S., Chan, W.H.: Two efficient algorithms for linear time suffix array construction. IEEE Trans. Comput. 60(10), 1471–1484 (2011)

    Article  MathSciNet  Google Scholar 

  35. Puglisi, S.J., Smyth, W.F., Turpin, A.H.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. (CSUR) 39(2), 4 (2007)

    Article  Google Scholar 

  36. Sadakane, K.: A fast algorithm for making suffix arrays and for burrows-wheeler transformation. In: Data Compression Conference (DCC), pp. 129–138. IEEE (1998)

    Google Scholar 

  37. Salowe, J., Steiger, W.: Simplified stable merging tasks. J. Algorithms 8(4), 557–571 (1987)

    Article  MathSciNet  Google Scholar 

  38. Schürmann, K.B., Stoye, J.: An incomplex algorithm for fast suffix array construction. Softw.: Pract. Exp. 37(3), 309–329 (2007)

    Google Scholar 

  39. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This research is supported in part by the National Basic Research Program of China Grant 2015CB358700, the National Natural Science Foundation of China Grant 61772297, 61632016, 61761146003, and a grant from Microsoft Research Asia. The authors would like to thank Ge Nong for his help in our experiments, and Gonzalo Navarro for helpful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhize Li or Jian Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, Z., Li, J., Huo, H. (2018). Optimal In-Place Suffix Sorting. In: Gagie, T., Moffat, A., Navarro, G., Cuadros-Vargas, E. (eds) String Processing and Information Retrieval. SPIRE 2018. Lecture Notes in Computer Science(), vol 11147. Springer, Cham. https://doi.org/10.1007/978-3-030-00479-8_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00479-8_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00478-1

  • Online ISBN: 978-3-030-00479-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics