Efficient Keyword Search over Data-Centric XML Documents

Li, Guoliang; Feng, Jianhua; Ta, Na; Zhou, Lizhu

doi:10.1007/978-3-540-72524-4_51

Guoliang Li¹,
Jianhua Feng¹,
Na Ta¹ &
…
Lizhu Zhou¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4505))

Included in the following conference series:

1134 Accesses
2 Citations

Abstract

We in this paper investigate keyword search over data-centric XML documents. We first present a novel method to divide an XML document into self-integrated subtrees, which are connected subtrees and can capture different structural information of the XML document. We then propose the meaningful self-integrated trees, which contain all the keywords and describe how the keywords are interrelated, to answer keyword search over XML documents. In addition, we introduce the B ⁺-tree index to accelerate the retrieval of those meaningful self-integrated trees. Moreover, to further enhance the performance of keyword search, we present Bloom Filter to improve the efficiency of generating those meaningful self-integrated trees. Finally, we conducted extensive experiments to evaluate the performance of our method, and the experimental results demonstrate that our method achieves high efficiency and outperforms the existing approaches significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

http://dblp.uni-trier.de/xml/
http://inex.is.informatik.uni-duisburg.de/2006/index.html
Agrawal, S., Chaudhuri, S., Das, G.: Dbxplorer: A system for keyword-based search over relational databases. In: ICDE, pp. 5–16 (2002)
Google Scholar
Amer-Yahia, S., Botev, C., Dorre, J., Shanmugasundaram, J.: Xquery full-text extensions explained. IBM Systems Journal 45(2), 335–352 (2006)
Article Google Scholar
Amer-Yahia, S., Curtmola, E., Deutsch, A.: Flexible and efficient xml search with complex full-text predicates. In: SIGMOD, pp. 575–586 (2006)
Google Scholar
Amer-Yahia, S., Koudas, N., Marian, A., Srivastava, D., Toman, D.: Structure and content scoring for xml. In: VLDB, pp. 361–372 (2005)
Google Scholar
Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using banks. In: ICDE, pp. 431–440 (2002)
Google Scholar
Bloom, B.: Space/time trade-offs in hash coding with allowable errors. Communications of ACM 13(7), 422–426 (1970)
Article MATH Google Scholar
Botev, C., Amer-Yahia, S., Shanmugasundaram, J.: Expressiveness and Performance of Full-Text Search Languages. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 349–367. Springer, Heidelberg (2006)
Chapter Google Scholar
Chaudhuri, S., Das, G., Hristidis, V., Weikum, G.: Probabilistic ranking of database query results. In: VLDB, pp. 888–899 (2004)
Google Scholar
Cohen, S., Kanza, Y., Kimelfeld, B., Sagiv, Y.: Interconnection semantics for keyword search in xml. In: CIKM, pp. 389–396 (2005)
Google Scholar
Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: Xsearch: A semantic search engine for xml. In: VLDB, pp. 45–56 (2003)
Google Scholar
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: Xrank: Ranked keyword search over xml documents. In: SIGMOD, pp. 16–27 (2003)
Google Scholar
Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)
Article MATH MathSciNet Google Scholar
Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient ir-style keyword search over relational databases. In: VLDB, pp. 850–861 (2003)
Google Scholar
Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword proximity search in xml trees. IEEE Trans. Knowl. Data Eng. 18(4) (2006)
Google Scholar
Hristidis, V., Papakonstantinou, Y.: Discover: Keyword search in relational databases. In: VLDB, pp. 670–681 (2002)
Google Scholar
Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword proximity search on xml graphs. In: ICDE, pp. 367–378 (2003)
Google Scholar
Kimelfeld, B., Sagiv, Y.: Finding and approximating top-k answers in keyword proximity search. In: PODS, pp. 173–182 (2006)
Google Scholar
Li, Y., Yang, H., Jagadish, H.V.: Nalix: an interactive natural language interface for querying xml. In: SIGMOD, pp. 900–902 (2005)
Google Scholar
Li, Y., Yang, H., Jagadish, H.V.: Constructing a Generic Natural Language Interface for an XML Database. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 737–754. Springer, Heidelberg (2006)
Chapter Google Scholar
Li, Y., Yu, C., Jagadish, H.V.: Schema-free xquery. In: VLDB, pp. 72–84 (2004)
Google Scholar
Liu, F., Yu, C., Meng, W., Chowdhury, A.: Effective keyword search in relational databases. In: SIGMOD, pp. 563–574 (2006)
Google Scholar
Marais, J., Bharat, K.: Supporting cooperative and personal surfing with a desktop assistant. In: ACM UIST, ACM Press, New York (1997)
Google Scholar
Marian, A., Amer-Yahia, S., Koudas, N., Srivastava, D.: Adaptive processing of top-k queries in xml. In: ICDE, pp. 162–173 (2005)
Google Scholar
Pradhan, S.: An algebraic query model for effective and efficient retrieval of xml fragments. In: VLDB, pp. 295–306 (2006)
Google Scholar
Schieber, B., Vishkin, U.: On finding lowest common ancestors: Simplification and parallelization. SIAM J. Comput. 17(6), 1253–1262 (1988)
Article MATH MathSciNet Google Scholar
Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest lcas in xml databases. In: SIGMOD, pp. 527–538 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, P.R. China
Guoliang Li, Jianhua Feng, Na Ta & Lizhu Zhou

Authors

Guoliang Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Feng
View author publications
You can also search for this author in PubMed Google Scholar
Na Ta
View author publications
You can also search for this author in PubMed Google Scholar
Lizhu Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Guozhu Dong Xuemin Lin Wei Wang Yun Yang Jeffrey Xu Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, G., Feng, J., Ta, N., Zhou, L. (2007). Efficient Keyword Search over Data-Centric XML Documents. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds) Advances in Data and Web Management. APWeb WAIM 2007 2007. Lecture Notes in Computer Science, vol 4505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72524-4_51

Download citation

DOI: https://doi.org/10.1007/978-3-540-72524-4_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72483-4
Online ISBN: 978-3-540-72524-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics