Skip to main content

Efficient and Compact Representations of Some Non-canonical Prefix-Free Codes

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9954))

Included in the following conference series:

Abstract

For many kinds of prefix-free codes there are efficient and compact alternatives to the traditional tree-based representation. Since these put the codes into canonical form, however, they can only be used when we can choose the order in which codewords are assigned to characters. In this paper we first show how, given a probability distribution over an alphabet of \(\sigma \) characters, we can store a nearly optimal alphabetic prefix-free code in \(o (\sigma )\) bits such that we can encode and decode any character in constant time. We then consider a kind of code introduced recently to reduce the space usage of wavelet matrices (Claude, Navarro, and Ordóñez, Information Systems, 2015). They showed how to build an optimal prefix-free code such that the codewords’ lengths are non-decreasing when they are arranged such that their reverses are in lexicographic order. We show how to store such a code in \(\mathcal {O}\!\left( {\sigma \log L + 2^{\epsilon L}}\right) \) bits, where L is the maximum codeword length and \(\epsilon \) is any positive constant, such that we can encode and decode any character in constant time under reasonable assumptions. Otherwise, we can always encode and decode a codeword of \(\ell \) bits in time \(\mathcal {O}\!\left( {\ell }\right) \) using \(\mathcal {O}\!\left( {\sigma \log L}\right) \) bits of space.

Funded in part by European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 690941 (project BIRDS). The first author was supported by: MINECO (PGE and FEDER) grants TIN2013-47090-C3-3-P and TIN2015-69951-R; MINECO and CDTI grant ITC-20151305; ICT COST Action IC1302; and Xunta de Galicia (co-founded with FEDER) grant GRC2013/053. The second author was supported by Academy of Finland grants 268324 and 250345 (CoECGR). The fourth author was supported by Millennium Nucleus Information and Coordination in Networks ICM/FIC P10-024F, Chile.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Since the code tree has height L and \(\sigma \) leaves, it follows that \(L < \sigma \).

  2. 2.

    This descent is conceptual; we do not have a concrete node v at each level, but we do know \(r_v\).

References

  1. Claude, F., Navarro, G., Ordóñez, A.: The wavelet matrix: an efficient wavelet tree for large alphabets. Inf. Syst. 47, 15–32 (2015)

    Article  Google Scholar 

  2. Evans, W., Kirkpatrick, D.G.: Restructuring ordered binary trees. J. Algorithms 50, 168–193 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  3. Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences, full-text indexes. ACM Trans. Algorithm 3(2), 20 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  4. Gagie, T., He, M., Munro, J.I., Nicholson, P.K.: Finding frequent elements in compressed 2D arrays and strings. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 295–300. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  5. Gagie, T., Navarro, G., Nekrich, Y., Ordóñez, A.: Efficient and compact representations of prefix codes. IEEE Trans. Inf. Theory 61(9), 4999–5011 (2015)

    Article  MathSciNet  Google Scholar 

  6. Grossi, R., Gupta, A., and Vitter, J.S.: High-order entropy-compressed text indexes. In: Proceedings SODA, pp. 841–850 (2003)

    Google Scholar 

  7. Itai, A.: Optimal alphabetic trees. SIAM J. Comp. 5, 9–18 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  8. Kraft, L.G.: A device for quantizing, grouping, and coding amplitude modulated pulses. M.Sc. thesis, EE Dept., MIT (1949)

    Google Scholar 

  9. Munro, J.I., Raman, V.: Succinct representation of balanced parentheses and static trees. SIAM J. Comp. 31(3), 762–776 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  10. Navarro, G.: Wavelet trees for all. J. Discr. Algorithm 25, 2–20 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  11. Navarro, G., Providel, E.: Fast, small, simple rank/select on bitmaps. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276, pp. 295–306. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  12. Pǎtraşcu, M.: Succincter. In: Proceedings FOCS, pp. 305–313 (2008)

    Google Scholar 

  13. Schwartz, E.S., Kallick, B.: Generating a canonical prefix encoding. Commun. ACM 7, 166–169 (1964)

    Article  MATH  Google Scholar 

  14. Wessner, R.L.: Optimal alphabetic search trees with restricted maximal height. Inf. Proc. Lett. 4, 90–94 (1976)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research was carried out in part at University of A Coruña, Spain, while the second author was visiting and the fifth author was a PhD student there. It started at a StringMasters workshop at the Research Center on Information and Communication Technologies (CITIC) of the university. The workshop was partly funded by EU RISE project BIRDS (Bioinformatics and Information Retrieval Data Structures). The authors thank Nieves Brisaboa and Susana Ladra.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Travis Gagie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Fariña, A., Gagie, T., Manzini, G., Navarro, G., Ordóñez, A. (2016). Efficient and Compact Representations of Some Non-canonical Prefix-Free Codes. In: Inenaga, S., Sadakane, K., Sakai, T. (eds) String Processing and Information Retrieval. SPIRE 2016. Lecture Notes in Computer Science(), vol 9954. Springer, Cham. https://doi.org/10.1007/978-3-319-46049-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46049-9_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46048-2

  • Online ISBN: 978-3-319-46049-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics