Skip to main content

Building a Scalable Web Query System

  • Conference paper
Databases in Networked Information Systems (DNIS 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4777))

Included in the following conference series:

Abstract

Nowadays, the dominant way to find information on the web is through search. General search engines are very effective, but search phrases and results are unstructured and that limits a user’s ability to further automate the processing of the search results. In recent years, we have seen efforts to build systems that support more precise query on the web for certain content verticals. We describe the general problems for building an extensible web query system and report some of our work in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Castellanos, M., Chen, Q., Dayal, U., Hsu, M., Lemon, M., Siegel, P., Stinger, J.: Component Advisor: A tool for automatically extracting electronic component data from Web datasheets. In: Proceedings of the Workshop on Reuse of Web-based Information, 7th International World Wide Web Conference (WWW7), Brisbane, Australia (1998)

    Google Scholar 

  2. Nie, Z., Wen, J., Ma, W.: Object-level Vertical Search. In: Proceedings of Conf. on Innovative Data Systems Research, Pacific Grove, California (2007)

    Google Scholar 

  3. Weikum, G.: DB&IR: both sides now. In: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, Beijing, China, pp. 25–30 (2007)

    Google Scholar 

  4. Chakrabarti, S., van den Berg, M., Dom, B.: Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery. Computer Networks 31(11-16), 1623–1640 (1999)

    Article  Google Scholar 

  5. Diligenti, M., Coetzee, F., Lawrence, S., Giles, C.L., Gori, M.: Focused Crawling Using Context Graphs. In: Proceedings of 26th Int. Conf. on Very Large Databases (VLDB), Cairo, Egypt, pp. 527–534 (2000)

    Google Scholar 

  6. Kan, M., Thi, H.: Fast Webpage Classification Using URL Features. In: Proceedings of the 14th Int. Conf., Bremen, Germany (2005)

    Google Scholar 

  7. Dumais, S., Chen, H.: Hierarchical Classification of Web Content. In: Proceedings of the 23rd ACM Int. Conf. on Research and Development in Information Retrieval (SIGIR-2000), Athens, Greece (2000)

    Google Scholar 

  8. Calado, P., Cristo, M., Moura, E., Ziviani, N., Ribeiro-Neto, B., Gonalves, M.: Combining Link-Based and Content-Based Methods for Web Document Classification. In: CIKM 2003. Proceedings or the 12th Int. Conf. on Information and Knowledge Management, New Orleans, Louisiana (2003)

    Google Scholar 

  9. McCallum, A.: Information Extraction: Distilling Structured Data from Unstructured Text. In: ACM QUEUE, pp. 49–57 (November 2005)

    Google Scholar 

  10. Arasu, A., Garcia-Molina, H.: Extracting Structured Data from Web Pages. In: Proceedings of the 2003 ACM SIGMOD Int. Conf., San Diego, California (2003)

    Google Scholar 

  11. Yin, P., Zhang, M., Deng, Z., Yang, D.: Metadata Extraction from Bibliographies Using Bigram HMM. In: Chen, Z., Chen, H., Miao, Q., Fu, Y., Fox, E., Lim, E.-p. (eds.) ICADL 2004. LNCS, vol. 3334, pp. 310–319. Springer, Heidelberg (2004)

    Google Scholar 

  12. Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the 18th Int. Conf. on Machine Learning, pp. 282–289. Morgan Kaufmann, San Francisco, CA (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Subhash Bhalla

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hsu, M., Xiong, Y. (2007). Building a Scalable Web Query System. In: Bhalla, S. (eds) Databases in Networked Information Systems. DNIS 2007. Lecture Notes in Computer Science, vol 4777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75512-8_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75512-8_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75511-1

  • Online ISBN: 978-3-540-75512-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics