Skip to main content

A Quantitative Summary of XML Structures

  • Conference paper
Conceptual Modeling - ER 2006 (ER 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4215))

Included in the following conference series:

Abstract

Statistical summaries in relational databases mainly focus on the distribution of data values and have been found useful for various applications, such as query evaluation and data storage. As xml has been widely used, e.g. for online data exchange, the need for (corresponding) statistical summaries in xml has been evident. While relational techniques may be applicable to the data values in xml documents, novel techniques are requried for summarizing the structures of xml documents. In this paper, we propose metrics for major structural properties, in particular, nestings of entities and one-to-many relationships, of XML documents. Our technique is different from the existing ones in that we generate a quantitative summary of an xml structure. By using our approach, we illustrate that some popular real-world and synthetic xml benchmark datasets are indeed highly skewed and hardly hierarchical and contain few recursions. We wish this preliminary finding shreds insight on improving the design of xml benchmarking and experimentations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bex, G.J., Neven, F., den Bussche, J.V.: DTDs versus XML Schema: A Practical Study. In: WebDB, pp. 79–84 (2004)

    Google Scholar 

  2. Bohannon, P., Choi, B., Fan, W.: Incremental evaluation of schema-directed XML publishing. In: SIGMOD (2004)

    Google Scholar 

  3. Bohannon, P., Freire, J., Roy, P., Simeon, J.: From XML schema to relations: A cost-based approach to XML storage. In: ICDE (2002)

    Google Scholar 

  4. Boncz, P.A., Grust, T., van Keulen, M., Manegold, S., Rittinger, J., Teubner, J.: MonetDB/XQuery: a fast XQuery processor powered by a relational engine. In: SIGMOD, pp. 479–490 (2006)

    Google Scholar 

  5. Braganholo, V.P., Davidson, S.B., Heuser, C.A.: From XML view updates to relational view updates: old solutions to a new problem. In: VLDB (2004)

    Google Scholar 

  6. Buneman, P., Choi, B., Fan, W., Hutchison, R., Mann, R., Viglas, S.: Vectorizing and querying large xml repositories. In: ICDE, pp. 261–272 (2005)

    Google Scholar 

  7. Chen, Z., Jagadish, H.V., Korn, F., Koudas, N., Muthukrishnan, S., Ng, R., Srivastava, D.: Counting twig matches in a tree. In: ICDE (2001)

    Google Scholar 

  8. Cheney, J.: Compressing XML with multiplexed hierarchical PPM models. In: Data Compression Conference (2001)

    Google Scholar 

  9. Choi, B.: What are real DTDs like. In: WebDB, pp. 43–48 (2002)

    Google Scholar 

  10. Choi, B.: Document decomposition for XML compression: A heuristic approach. In: Li Lee, M., Tan, K.-L., Wuwongse, V. (eds.) DASFAA 2006. LNCS, vol. 3882, pp. 202–217. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  11. Deutsch, A., Fernandez, M.F., Suciu, D.: Storing semistructured data with STORED. In: SIGMOD, pp. 431–442. ACM Press, New York (1999)

    Google Scholar 

  12. ExPASy. Swiss-prot and TrEMBL, available at: http://www.expasy.ch/sprot/

  13. Fiebig, T., Helmer, S., Kanne, C.-C., Moerkotte, G., Neumann, J., Schiele, R., Westmann, T.: Anatomy of a native XML base management system. VLDB Journal 11(4), 292–314 (2002)

    Article  MATH  Google Scholar 

  14. Florescu, D., Kossmann, D.: Storing and querying XML data using an RDMBS. IEEE Data Engineering Bulletin 22(3), 27–34 (1999)

    Google Scholar 

  15. Freire, J., Haritsa, J.R., Ramanath, M., Roy, P., Siméon, J.: StatiX: making XML count. In: SIGMOD Conference, pp. 181–191 (2002)

    Google Scholar 

  16. Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting local similarity for efficient indexing of paths in graph structured data. In: ICDE (2002)

    Google Scholar 

  17. Ley, M.: DBLP Bibliography (March 2005), available at: http://www.informatik.uni-trier.de/~ley/db/

  18. Liefke, H., Suciu, D.: XMILL: An efficient compressor for XML data. In: SIGMOD (2000)

    Google Scholar 

  19. McHugh, J., Widom, J.: Query optimization for XML. In: VLDB (1999)

    Google Scholar 

  20. Milo, T., Suciu, D.: Index structures for path expressions. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 277–295. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  21. National Aeronautics and Space Administration. The NASA XML project, available at: http://xml.nasa.gov/xmlwg/index.htm

  22. Paparizos, S., Al-Khalifa, S., Chapman, A., Jagadish, H.V., Lakshmanan, L.V.S., Nierman, A., Patel, J.M., Srivastava, D., Wiwatwattana, N., Wu, Y., Yu, C.: TIMBER: A native system for querying XML. In: SIGMOD (2003)

    Google Scholar 

  23. Polyzotis, N., Garofalakis, M.N.: Statistical synopses for graph-structured XML databases. In: SIGMOD (2002)

    Google Scholar 

  24. Poosala, V., Ioannidis, Y.E., Haas, P.J., Shekita, E.J.: Improved histograms for selectivity estimation of range predicates. In: SIGMOD, pp. 294–305 (1996)

    Google Scholar 

  25. Prakash, S., Bhowmick, S.S., Madria, S.K.: Efficient recursive XML query processing in relational database systems. In: Atzeni, P., Chu, W., Lu, H., Zhou, S., Ling, T.-W. (eds.) ER 2004. LNCS, vol. 3288, pp. 493–510. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  26. Runapongsa, K., Patel, J., Jagadish, H., Chen, Y., Al-Khalifa, S.: The Michigan benchmark: Towards XML query performance diagnostics (2003)

    Google Scholar 

  27. Schmidt, A.: XMark – an XML benchmakr project (2003), available at: http://monetdb.cwi.nl/xml/generator.html

  28. Schmidt, A., Waas, F., Kersten, M., Carey, M.J., Manolescu, I., Busse, R.: XMark: A benchmark for XML data management. In: VLDB, pp. 974–985 (2002)

    Google Scholar 

  29. Segoufin, L., Vianu, V.: Validating streaming xml documents. In: PODS, pp. 53–64 (2002)

    Google Scholar 

  30. Shanmugasundaram, J., Shekita, E., Kiernan, J.: A general technique for querying XML documents using a relational database system. SIGMOD Record 30(3), 20–26 (2001)

    Article  Google Scholar 

  31. Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D.J., Naughton, J.F.: Relational databases for querying XML documents: Limitations and opportunities. VLDB Journal, 302–314 (1999)

    Google Scholar 

  32. ToXGene. The ToX XML generator (2005), available at: http://www.cs.toronto.edu/tox/toxgene/

  33. W3C. Extensible Markup Language (XML), available at: http://www.w3.org/XML/

  34. Yao, B.B., Ozsu, M.T., Khandelwal, N.: XBench benchmark and performance testing of XML DBMSs. In: ICDE, pp. 621–633 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lin, Z., He, B., Choi, B. (2006). A Quantitative Summary of XML Structures. In: Embley, D.W., Olivé, A., Ram, S. (eds) Conceptual Modeling - ER 2006. ER 2006. Lecture Notes in Computer Science, vol 4215. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11901181_18

Download citation

  • DOI: https://doi.org/10.1007/11901181_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-47224-7

  • Online ISBN: 978-3-540-47227-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics