Skip to main content

Searchable Compression of Office Documents by XML Schema Subtraction

  • Conference paper
Database and XML Technologies (XSym 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6309))

Included in the following conference series:

Abstract

Starting with Microsoft Office 2007, the Office Open XML file formats have become the default file format of Microsoft Office. As each day a lot of office documents have to be stored and transferred, reducing the document size will yield a benefit when storing and transferring these files. We present a compressed format for XML-based office documents that omits that data from an office document that is already defined by the Office Open XML format. Our evaluation shows that our compressed format reduces the – already compressed – office documents to a data size down to 41% of the original document size. Furthermore, for search operations tested in our evaluation, searching is faster on our compressed office documents than it is on the original documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adiego, J., Navarro, G., de la Fuente, P.: Lempel-Ziv Compression of Structured Text. In: Data Compression Conference (2004)

    Google Scholar 

  2. Arion, Bonifati, A., Manolescu, I., Pugliese, A.: XQueC: A Query-Conscious Compressed XML Database. ACM Transactions on Internet Technology (2007)

    Google Scholar 

  3. Bayardo, R.J., Gruhl, D., Josifovski, V., Myllymaki, J.: An evaluation of binary xml encoding optimizations for fast stream based XML processing. In: Proc. of the 13th International Conference on World Wide Web (2004)

    Google Scholar 

  4. Böttcher, S., Steinmetz, R., Klein, N.: XML Index Compression by DTD Subtraction. In: 9th International Conference on Enterprise Information Systems, ICEIS (2007)

    Google Scholar 

  5. Böttcher, S., Hartel, R., Messinger, C.: SEPA. Queryable SEPA Message Compression by XML Schema Subtraction. In: 12th International Conference on Enterprise Information Systems, ICEIS (2010)

    Google Scholar 

  6. Buneman, P., Grohe, M., Koch, C.: Path Queries on Compressed XML. In: VLDB (2003)

    Google Scholar 

  7. Busatto, G., Lohrey, M., Maneth, S.: Efficient Memory Representation of XML Dokuments. In: Bierman, G., Koch, C. (eds.) DBPL 2005. LNCS, vol. 3774, pp. 199–216. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  8. Cheney, J.: Compressing XML with multiplexed hierarchical models. In: Proceedings of the 2001 IEEE Data Compression Conference, DCC 2001 (2001)

    Google Scholar 

  9. Cheng, J., Ng, W.: XQzip, Querying Compressed XML Using Structural Indexing. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 219–236. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  10. Cleary, J., Witten, I.: Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications 32(4), 396–402 (1984)

    Article  Google Scholar 

  11. Cormack, G., Horspool, N.: Data compression using adaptive coding and partial string matching. Computer Journal 30(6) (1987)

    Google Scholar 

  12. Fraenkel, A., Klein, S.: Robust universal complete codes for transmission and compresion. Discrete Applied Mathematics 64, 31–55 (1996)

    Article  MATH  Google Scholar 

  13. Girardot, M., Sundaresan, N., Millau: An Encod¬ing Format for Efficient Representation and Exchange of XML over the Web. In: Proceedings of the 9th International WWW Conference (2000)

    Google Scholar 

  14. Golomb, S.W.: Run-length encodings. IEEE Trans Info Theory 12(3), 399 (1966)

    Article  MATH  Google Scholar 

  15. Huffman, D.A.: A method for the construction of minimum-redundancy codes. In: Proc. of the I.R.E. (1952)

    Google Scholar 

  16. Liefke, H., Suciu, D.: XMill: An Efficient Compressor for XML Data. In: Proc. of ACM SIGMOD (2000)

    Google Scholar 

  17. Martin, G.N.N.: Range encoding: an algorithm for removing redundancy from a digitized message. In: Video and Data Recording Conference, Southampton (1979)

    Google Scholar 

  18. Min, J.K., Park, M.J., Chung, C.W.: XPRESS: A Queriable Compression for XML Data. In: Proceedings of SIGMOD (2003)

    Google Scholar 

  19. Ng, W., Lam, W.Y., Wood, P.T., Levene, M.: XCQ: A queriable XML compression system. Knowledge and Information Systems (2006)

    Google Scholar 

  20. Subramanian, H., Shankar, P.: Compressing XML Documents Using Recursive Finite State Automata. In: Farré, J., Litovsky, I., Schmitz, S. (eds.) CIAA 2005. LNCS, vol. 3845, pp. 282–293. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  21. Tolani, P.M., Hartisa, J.R.: XGRIND: A query-friendly XML compressor. In: Proc. ICDE (2002)

    Google Scholar 

  22. Welch, T.A.: A technique for high-performance data compression. Computer Journal 17(6), 8–19 (1984)

    Article  Google Scholar 

  23. Werner, C., Buschmann, C., Brandt, Y., Fischer, S.: Compressing SOAP Messages by using Pushdown Automata. In: ICWS (2006)

    Google Scholar 

  24. Witten, H., Neal, R.M., Cleary, J.G.: Arithmetic coding for data compression. Communcations of the ACM 30(6), 520–540 (1987)

    Article  Google Scholar 

  25. Zhang, N., Kacholia, V., Özsu, M.T.: A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML. In: ICDE (2004)

    Google Scholar 

  26. Ziv, Lempel, A.: A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  27. Ziv, Lempel, A.: Compression on individual sequences via variable-rate coding. IEEE Transactions on Information Theory (1978)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Böttcher, S., Hartel, R., Messinger, C. (2010). Searchable Compression of Office Documents by XML Schema Subtraction. In: Lee, M.L., Yu, J.X., Bellahsène, Z., Unland, R. (eds) Database and XML Technologies. XSym 2010. Lecture Notes in Computer Science, vol 6309. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15684-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15684-7_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15683-0

  • Online ISBN: 978-3-642-15684-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics