Approximate XML Query Processing

Guerrini, Giovanna

doi:10.1007/978-3-642-28323-9_6

Giovanna Guerrini³

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 36))

950 Accesses
1 Citations

Abstract

The standard XML query languages, XPath and XQuery, are built on the assumption of a regular structure with well-defined parent/child relationships between nodes and exact conditions on nodes. Full text extensions to both languages allow Information Retrieval (IR) style queries over text-rich documents. Important applications exist for which the purely textual information is not predominant and documents exhibit a structure, that is however not relatively regular. Thus, approaches to relax both content and structure conditions in queries on XML document collections and to rank results according to some measure to assess similarity have been proposed, as well as processing approaches to efficiently evaluate them. In the chapter, the various dimensions of query relaxation and alternative approaches to approximate processing will be discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann (1999)
Google Scholar
Agrawal, S., Chaudhuri, S., Das, G., Gionis, A.: Automated Ranking of Database Query Results. In: CIDR (2003)
Google Scholar
Amer-Yahia, S., Cho, S., Srivastava, D.: Tree Pattern Relaxation. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 496–513. Springer, Heidelberg (2002)
Chapter Google Scholar
Amer-Yahia, S., Koudas, N., Marian, A., Srivastava, D., Toman, D.: Structure and Content Scoring for XML. In: VLDB, pp. 361–372 (2005)
Google Scholar
Amer-Yahia, S., Lakshmanan, L.V.S., Pandit, S.: FleXPath: Flexible Structure and Full-Text Querying for XML. In: SIGMOD Conference, pp. 83–94 (2004)
Google Scholar
Amer-Yahia, S., Lalmas, M.: XML Search: Languages, INEX and Scoring. SIGMOD Record 35(4), 16–23 (2006)
Article Google Scholar
Augsten, N., Barbosa, D., Böhlen, M.H., Palpanas, T.: TASM: Top-k Approximate Subtree Matching. In: ICDE, pp. 353–364 (2010)
Google Scholar
Augsten, N., Böhlen, M.H., Dyreson, C.E., Gamper, J.: Approximate Joins for Data-Centric XML. In: ICDE, pp. 814–823 (2008)
Google Scholar
Augsten, N., Böhlen, M.H., Gamper, J.: Approximate Matching of Hierarchical Data Using pq-Grams. In: VLDB, pp. 301–312 (2005)
Google Scholar
Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press / Addison-Wesley (1999)
Google Scholar
Bruno, N., Koudas, N., Srivastava, D.: Holistic Twig Joins: Optimal XML Pattern Matching. In: SIGMOD Conference, pp. 310–321 (2002)
Google Scholar
Cao, H., Qi, Y.Q., Candan, K.S., Sapino, M.L.: Feedback-driven Result Ranking and Query Refinement for Exploring Semi-structured Data Collections. In: EDBT, pp. 3–14 (2010)
Google Scholar
Chaudhuri, S., Ramakrishnan, R., Weikum, G.: Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? In: CIDR, pp. 1–12 (2005)
Google Scholar
Damiani, E., Lavarini, N., Marrara, S., Oliboni, B., Pasini, D., Tanca, L., Viviani, G.: The APPROXML Tool Demonstration. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 753–755. Springer, Heidelberg (2002)
Chapter Google Scholar
Deshpande, A., Ives, Z.G., Raman, V.: Adaptive Query Processing. Foundations and Trends in Databases 1(1), 1–140 (2007)
Article MATH Google Scholar
Fagin, R., Lotem, A., Naor, M.: Optimal Aggregation Algorithms for Middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Article MathSciNet MATH Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press (1998)
Google Scholar
Goldman, R., Widom, J.: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In: VLDB, pp. 436–445 (1997)
Google Scholar
Gou, G., Chirkova, R.: Efficiently Querying Large XML Data Repositories: A Survey. IEEE Trans. Knowl. Data Eng. 19(10), 1381–1403 (2007)
Article Google Scholar
Grust, T., van Keulen, M., Teubner, J.: Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps. In: VLDB, pp. 524–525 (2003)
Google Scholar
Guerrini, G., Mesiti, M., Bertino, E.: Structural Similarity Measures in Sources of XML Documents. In: Darmont, J., Boussaid, O. (eds.) Processing and Managing Complex Data for Decision Support, pp. 247–279. IDEA Group (2006)
Google Scholar
Guerrini, G., Mesiti, M., Sanz, I.: An Overview of Similarity Measures for Clustering XML Documents. In: Vakali, A., Pallis, G. (eds.) Web Data Management Practices: Emerging Techniques and Technologies, IDEA Group (2007)
Google Scholar
Guha, S., Jagadish, H.V., Koudas, N., Srivastava, D., Yu, T.: Approximate XML Joins. In: SIGMOD Conference, pp. 287–298 (2002)
Google Scholar
Hung, E., Deng, Y., Subrahmanian, V.S.: TOSS: An Extension of TAX with Ontologies and Similarity Queries. In: SIGMOD Conference, pp. 719–730 (2004)
Google Scholar
Ide, N., Véronis, J.: Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art. Computational Linguistics 24(1), 1–40 (1998)
Google Scholar
Ilyas, I.F., Beskales, G., Soliman, M.A.: A Survey of Top-k Query Processing Techniques in Relational Database Systems. ACM Comput. Surv. 40(4) (2008)
Google Scholar
Jones, K.S., Walker, S., Robertson, S.E.: A Probabilistic Model of Information Retrieval: Development and Comparative Experiments - Part 1 and Part 2. Inf. Process. Manage. 36(6), 779–840 (2000)
Article Google Scholar
Lalmas, M.: XML Retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool Publishers (2009)
Google Scholar
Lalmas, M., Trotman, A.: XML Retrieval. In: Encyclopedia of Database Systems, pp. 3616–3621 (2009)
Google Scholar
Lau, H.L., Ng, W.: A Multi-Ranker Model for Adaptive XML Searching. VLDB J. 17(1), 57–80 (2008)
Article Google Scholar
Lovins, J.B.: Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistics 11, 22–31 (1968)
Google Scholar
Marian, A., Amer-Yahia, S., Koudas, N., Srivastava, D.: Adaptive Processing of Top- k Queries in XML. In: ICDE, pp. 162–173 (2005)
Google Scholar
Marian, A., Schenkel, R., Theobald, M.: Ranked XML Processing. In: Encyclopedia of Database Systems, pp. 2325–2332 (2009)
Google Scholar
Navarro, G.: A Guided Tour to Approximate String Matching. ACM Comput. Surv. 33(1), 31–88 (2001)
Article Google Scholar
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill (1983)
Google Scholar
Sanz, I., Llavori, R.B., Mesiti, M., Guerrini, G.: ArHeX: Flexible Composition of Indexes and Similarity Measures for XML. In: ICDE Workshops, pp. 281–284 (2007)
Google Scholar
Sanz, I., Mesiti, M., Guerrini, G., Llavori, R.B.: Fragment-Based Approximate Retrieval in Highly Heterogeneous XML Collections. Data Knowl. Eng. 64(1), 266–293 (2008)
Article Google Scholar
Sanz, I., Mesiti, M., Guerrini, G., Llavori, R.B.: Flexible Multi-Similarity XML Data Querying with Top-k Processing. Tech. rep., Universitat Jaume I (2009)
Google Scholar
Schlieder, T.: Schema-Driven Evaluation of Approximate Tree-Pattern Queries. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 514–532. Springer, Heidelberg (2002)
Chapter Google Scholar
Tai, K.C.: The Tree-to-Tree Correction Problem. J. ACM 26(3), 422–433 (1979)
Article MathSciNet MATH Google Scholar
Tatarinov, I., Viglas, S., Beyer, K.S., Shanmugasundaram, J., Shekita, E.J., Zhang, C.: Storing and Querying Ordered XML using a Relational Database System. In: SIGMOD Conference, pp. 204–215 (2002)
Google Scholar
Tekli, J., Chbeir, R., Yétongnon, K.: An Overview on XML Similarity: Background, Current Trends and Future Directions. Computer Science Review 3(3), 151–173 (2009)
Article Google Scholar
Theobald, M., Bast, H., Majumdar, D., Schenkel, R., Weikum, G.: TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data. VLDB J. 17(1), 81–115 (2008)
Article Google Scholar
W3C: XML Path Language (XPath) 2.0 (2007), http://www.w3.org/TR/xpath20/
W3C: XQuery 1.0: An XML Query Language (2007), http://www.w3.org/TR/xquery/
W3C: XQuery and XPath Full Text 1.0 (2010), http://www.w3.org/TR/xpath-full-text-10/
Xin, D., Han, J., Chang, K.C.C.: Progressive and Selective Merge: Computing Top-k with ad-hoc Ranking Functions. In: SIGMOD Conference, pp. 103–114 (2007)
Google Scholar
Zhang, K., Shasha, D.: Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems. SIAM J. Comput. 18(6), 1245–1262 (1989)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Università di Genova, Genova, Italy
Giovanna Guerrini

Authors

Giovanna Guerrini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giovanna Guerrini .

Editor information

Editors and Affiliations

Dipto. Informatica e Science, dell'Informazione, Università Genova, Via Dodecaneso 35, Genova, 16146, Italy
Barbara Catania
, School of Electrical and Information, University of South Australia, Mawson Lakes Campus, Adelaide, SA 5095, South Australia, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Guerrini, G. (2013). Approximate XML Query Processing. In: Catania, B., Jain, L. (eds) Advanced Query Processing. Intelligent Systems Reference Library, vol 36. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28323-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-28323-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28322-2
Online ISBN: 978-3-642-28323-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics