Skip to main content

Efficient Keyword Search over Data-Centric XML Documents

  • Conference paper
Advances in Data and Web Management (APWeb 2007, WAIM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4505))

Abstract

We in this paper investigate keyword search over data-centric XML documents. We first present a novel method to divide an XML document into self-integrated subtrees, which are connected subtrees and can capture different structural information of the XML document. We then propose the meaningful self-integrated trees, which contain all the keywords and describe how the keywords are interrelated, to answer keyword search over XML documents. In addition, we introduce the B  + -tree index to accelerate the retrieval of those meaningful self-integrated trees. Moreover, to further enhance the performance of keyword search, we present Bloom Filter to improve the efficiency of generating those meaningful self-integrated trees. Finally, we conducted extensive experiments to evaluate the performance of our method, and the experimental results demonstrate that our method achieves high efficiency and outperforms the existing approaches significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://dblp.uni-trier.de/xml/

  2. http://inex.is.informatik.uni-duisburg.de/2006/index.html

  3. Agrawal, S., Chaudhuri, S., Das, G.: Dbxplorer: A system for keyword-based search over relational databases. In: ICDE, pp. 5–16 (2002)

    Google Scholar 

  4. Amer-Yahia, S., Botev, C., Dorre, J., Shanmugasundaram, J.: Xquery full-text extensions explained. IBM Systems Journal 45(2), 335–352 (2006)

    Article  Google Scholar 

  5. Amer-Yahia, S., Curtmola, E., Deutsch, A.: Flexible and efficient xml search with complex full-text predicates. In: SIGMOD, pp. 575–586 (2006)

    Google Scholar 

  6. Amer-Yahia, S., Koudas, N., Marian, A., Srivastava, D., Toman, D.: Structure and content scoring for xml. In: VLDB, pp. 361–372 (2005)

    Google Scholar 

  7. Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using banks. In: ICDE, pp. 431–440 (2002)

    Google Scholar 

  8. Bloom, B.: Space/time trade-offs in hash coding with allowable errors. Communications of ACM 13(7), 422–426 (1970)

    Article  MATH  Google Scholar 

  9. Botev, C., Amer-Yahia, S., Shanmugasundaram, J.: Expressiveness and Performance of Full-Text Search Languages. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 349–367. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Chaudhuri, S., Das, G., Hristidis, V., Weikum, G.: Probabilistic ranking of database query results. In: VLDB, pp. 888–899 (2004)

    Google Scholar 

  11. Cohen, S., Kanza, Y., Kimelfeld, B., Sagiv, Y.: Interconnection semantics for keyword search in xml. In: CIKM, pp. 389–396 (2005)

    Google Scholar 

  12. Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: Xsearch: A semantic search engine for xml. In: VLDB, pp. 45–56 (2003)

    Google Scholar 

  13. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: Xrank: Ranked keyword search over xml documents. In: SIGMOD, pp. 16–27 (2003)

    Google Scholar 

  14. Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  15. Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient ir-style keyword search over relational databases. In: VLDB, pp. 850–861 (2003)

    Google Scholar 

  16. Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword proximity search in xml trees. IEEE Trans. Knowl. Data Eng. 18(4) (2006)

    Google Scholar 

  17. Hristidis, V., Papakonstantinou, Y.: Discover: Keyword search in relational databases. In: VLDB, pp. 670–681 (2002)

    Google Scholar 

  18. Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword proximity search on xml graphs. In: ICDE, pp. 367–378 (2003)

    Google Scholar 

  19. Kimelfeld, B., Sagiv, Y.: Finding and approximating top-k answers in keyword proximity search. In: PODS, pp. 173–182 (2006)

    Google Scholar 

  20. Li, Y., Yang, H., Jagadish, H.V.: Nalix: an interactive natural language interface for querying xml. In: SIGMOD, pp. 900–902 (2005)

    Google Scholar 

  21. Li, Y., Yang, H., Jagadish, H.V.: Constructing a Generic Natural Language Interface for an XML Database. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 737–754. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  22. Li, Y., Yu, C., Jagadish, H.V.: Schema-free xquery. In: VLDB, pp. 72–84 (2004)

    Google Scholar 

  23. Liu, F., Yu, C., Meng, W., Chowdhury, A.: Effective keyword search in relational databases. In: SIGMOD, pp. 563–574 (2006)

    Google Scholar 

  24. Marais, J., Bharat, K.: Supporting cooperative and personal surfing with a desktop assistant. In: ACM UIST, ACM Press, New York (1997)

    Google Scholar 

  25. Marian, A., Amer-Yahia, S., Koudas, N., Srivastava, D.: Adaptive processing of top-k queries in xml. In: ICDE, pp. 162–173 (2005)

    Google Scholar 

  26. Pradhan, S.: An algebraic query model for effective and efficient retrieval of xml fragments. In: VLDB, pp. 295–306 (2006)

    Google Scholar 

  27. Schieber, B., Vishkin, U.: On finding lowest common ancestors: Simplification and parallelization. SIAM J. Comput. 17(6), 1253–1262 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  28. Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest lcas in xml databases. In: SIGMOD, pp. 527–538 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Guozhu Dong Xuemin Lin Wei Wang Yun Yang Jeffrey Xu Yu

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Li, G., Feng, J., Ta, N., Zhou, L. (2007). Efficient Keyword Search over Data-Centric XML Documents. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds) Advances in Data and Web Management. APWeb WAIM 2007 2007. Lecture Notes in Computer Science, vol 4505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72524-4_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72524-4_51

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72483-4

  • Online ISBN: 978-3-540-72524-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics