Abstract
We present a simple query language for XML, which supports hierarchical, Boolean-connected query patterns. The interpretation of a query is founded on cost-based query transformations: The total cost of a sequence of transformations measures the similarity between the query and the data and is used to rank the results. We introduce two polynomial-time algorithms that efficiently find the best n answers to the query: The first algorithm finds all approximate results, sorts them by increasing cost, and prunes the result list after the n then try. The second algorithm uses a structural summary -the schema- of the database to estimate the best k transformed queries, which in turn are executed against the database. We compare both approaches and show that the schema-based evaluation outperforms the pruning approach for small values of n. The pruning strategy is the better choice if n is close to the total number of approximate results for the query.
This research was supported by the German Research Society, Berlin-Brandenburg Graduate School in Distributed Information Systems (DFG grant no. GRK 316).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. Aboulnaga, J.F. Naughton, and C. Zhang. Generating synthetic complexstructured XML data. In Proceedings of WebDB’01, 2001.
A. Apostolico and Z. Galil, editors. Pattern Matching Algorithms, Chapter 14: Approximate Tree Pattern Matching. Oxford University Press, 1997.
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley Longman, 1999.
The Berkeley DB. Sleepycat Software Inc., 2000. http://www.sleepycat.com.
A. Bonifati and S. Ceri. Comparative analysis of five XML query languages. SIGMOD Record, 29(1), 2000.
T.T. Chinenyanga and N. Kushmerick. Expressive retrieval from XML documents. In Proceedings of SIGIR, 2001.
N. Fuhr and K. Groβjohann. XIRQL: A query language for information retrieval in XML documents. In Proceedings of SIGIR, 2001.
R. Goldman and J. Widom. DataGuides: Enabling query formulation and optimization in semistructured data. In Proceedings of VLDB, 1997.
T. Jiang, L. Wang, and K. Zhang. Alignment of trees-an alternative to tree edit. In Proceedings of Combinatorial Pattern Matching, 1994.
P. Kilpeläinen. Tree Matching Problems with Applications to Structured Text Databases. PhD thesis, University of Helsinki, Finland, 1992.
J. Robie, J. Lapp, and D. Schach. XML query language (XQL), 1998. http://www.w3.org/TandS/QL/QL98/pp/xql.html.
T. Schlieder. ApproXQL: Design and implementation of an approximate pattern matching language for XML. Report B 01-02, Freie Universität Berlin, 2001.
T. Schlieder. Schema-driven evaluation of ApproXQL queries. Report B 02-01, Freie Universität Berlin, 2002.
K.-C. Tai. The tree-to-tree correction problem. Journal of the ACM, 26(3):422–433, 1979.
A. Theobald and G. Weikum. Adding relevance to XML. In Proceedings of WebDB’00, 2000.
K. Zhang. A new editing based distance between unordered labeled trees. In Proceedings of Combinatorial Pattern Matching, 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schlieder, T. (2002). Schema-Driven Evaluation of Approximate Tree-Pattern Queries. In: Jensen, C.S., et al. Advances in Database Technology — EDBT 2002. EDBT 2002. Lecture Notes in Computer Science, vol 2287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45876-X_33
Download citation
DOI: https://doi.org/10.1007/3-540-45876-X_33
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43324-8
Online ISBN: 978-3-540-45876-0
eBook Packages: Springer Book Archive