Skip to main content

On the Value of Multiple Read/Write Streams for Data Compression

  • Chapter
Information Theory, Combinatorics, and Search Theory

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7777))

  • 2116 Accesses

Abstract

We study whether, when restricted to using polylogarithmic memory and polylogarithmic passes, we can achieve qualitatively better data compression with multiple read/write streams than we can with only one. We first show how we can achieve universal compression using only one pass over one stream. We then show that one stream is not sufficient for us to achieve good grammar-based compression. Finally, we show that two streams are necessary and sufficient for us to achieve entropy-only bounds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. van Aardenne-Ehrenfest, T., de Bruijn, N.G.: Circuits and trees in oriented linear graphs. Simon Stevin 28, 203–217 (1951)

    MathSciNet  MATH  Google Scholar 

  2. Aggarwal, G., Datar, M., Rajagopalan, S., Ruhl, M.: On the streaming model augmented with a sorting primitive. In: Proceedings of the 45th Symposium on Foundations of Computer Science, pp. 540–549 (2004)

    Google Scholar 

  3. Arge, L., Bender, M.A., Demaine, E.D., Holland-Minkley, B., Munro, J.I.: An optimal cache-oblivious priority queue and its application to graph algorithms. SIAM Journal on Computing 36(6), 1672–1695 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  4. Beame, P., Huynh, T.: On the value of multiple read/write streams for approximating frequency moments. In: Proceedings of the 49th Symposium on Foundations of Computer Science, pp. 499–508 (2008)

    Google Scholar 

  5. Bird, R.S., Mu, S.-C.: Inverting the Burrows-Wheeler transform. Journal of Functional Programming 14(6), 603–612 (2004)

    Article  MATH  Google Scholar 

  6. de Bruijn, N.G.: A combinatorial problem. Koninklijke Nederlandse Akademie van Wetenschappen 49, 758–764 (1946)

    MATH  Google Scholar 

  7. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm, Technical Report 24, Digital Equipment Corporation (1994)

    Google Scholar 

  8. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Transactions on Information Theory 51(7), 2554–2576 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  9. Chen, J., Yap, C.-K.: Reversal complexity. SIAM Journal on Computing 20(4), 622–638 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  10. Chien, Y.-F., Hon, W.-K., Shah, R., Vitter, J.S.: Geometric Burrows-Wheeler Transform: Linking range searching and text indexing. In: Proceedings of the Data Compression Conference, pp. 252–261 (2008)

    Google Scholar 

  11. Cilibrasi, R., Vitányi, P.: Clustering by compression. IEEE Transactions on Information Theory 51(4), 1523–1545 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  12. Ergün, F., Muthukrishnan, S., Sahinalp, S.C.: Sublinear Methods for Detecting Periodic Trends in Data Streams. In: Farach-Colton, M. (ed.) LATIN 2004. LNCS, vol. 2976, pp. 16–28. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  13. Ferragina, P., Gagie, T., Manzini, G.: Lightweight data indexing and compression in external memory. Algorithmica 63(3), 707–730 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  14. Flye Sainte-Marie, C.: Solution to question nr. 48. L’Intermédiare de Mathématiciens 1, 107–110 (1894)

    Google Scholar 

  15. Gagie, T.: Large alphabets and incompressibility. Information Processing Letters 99(6), 246–251 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  16. Gagie, T.: On the Value of Multiple Read/Write Streams for Data Compression. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009. LNCS, vol. 5577, pp. 68–77. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  17. Gagie, T., Gawrychowski, P.: Grammar-Based Compression in a Streaming Model. In: Dediu, A.-H., Fernau, H., Martín-Vide, C. (eds.) LATA 2010. LNCS, vol. 6031, pp. 273–284. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  18. Gagie, T., Manzini, G.: Move-to-Front, Distance Coding, and Inversion Frequencies Revisited. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 71–82. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  19. Gagie, T., Manzini, G.: Space-Conscious Compression. In: Kučera, L., Kučera, A. (eds.) MFCS 2007. LNCS, vol. 4708, pp. 206–217. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  20. Grohe, M., Koch, C., Schweikardt, N.: Tight lower bounds for query processing on streaming and external memory data. Theoretical Computer Science 380(1-3), 199–217 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  21. Grohe, M., Schweikardt, N.: Lower bounds for sorting with few random accesses to external memory. In: Proceedings of the 24th Symposium on Principles of Database Systems, pp. 238–249 (2005)

    Google Scholar 

  22. Gupta, A., Grossi, R., Vitter, J.S.: Nearly tight bounds on the encoding length of the Burrows-Wheeler Transform. In: Proceedings of the 4th Workshop on Analytic Algorithmics and Combinatorics, pp. 191–202 (2008)

    Google Scholar 

  23. Hernich, A., Schweikardt, N.: Reversal complexity revisited. Theoretical Computer Science 401(1-3), 191–205 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  24. Knuth, D.E.: The Art of Computer Programming, 2nd edn., vol. 3. Addison-Wesley (1998)

    Google Scholar 

  25. Kosaraju, R., Manzini, G.: Compression of low entropy strings with Lempel-Ziv algorithms. SIAM Journal on Computing 29(3), 893–911 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  26. Manzini, G.: An analysis of the Burrows-Wheeler Transform. Journal of the ACM 48(3), 407–430 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  27. Munro, J.I., Paterson, M.S.: Selection and sorting with limited storage. Theoretical Computer Science 12, 315–323 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  28. Muthukrishnan, S.: Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science. Now Publishers (2005)

    Google Scholar 

  29. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1) (2007)

    Google Scholar 

  30. Orlandi, A., Venturini, R.: Space-efficient substring occurrence estimation. In: Proceedings of the 30th Symposium on Principles of Database Systems, pp. 95–106 (2011)

    Google Scholar 

  31. Rissanen, J.: Complexity of strings in the class of Markov sources. IEEE Transactions on Information Theory 32(4), 526–532 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  32. Ruhl, J.M.: Efficient algorithms for new computational models, PhD thesis, Massachusetts Institute of Technology (2003)

    Google Scholar 

  33. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoretical Computer Science 302(1-3), 211–222 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  34. Savari, S.: Redundancy of the Lempel-Ziv incremental parsing rule. IEEE Transactions on Information Theory 43(1), 9–21 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  35. Schweikardt, N.: Machine models and lower bounds for query processing. In: Proceedings of the 26th Symposium on Principles of Database Systems, pp. 41–52 (2007)

    Google Scholar 

  36. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  37. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24(5), 530–536 (1978)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Gagie, T. (2013). On the Value of Multiple Read/Write Streams for Data Compression. In: Aydinian, H., Cicalese, F., Deppe, C. (eds) Information Theory, Combinatorics, and Search Theory. Lecture Notes in Computer Science, vol 7777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36899-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36899-8_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36898-1

  • Online ISBN: 978-3-642-36899-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics