Abstract
Nowadays, the dominant way to find information on the web is through search. General search engines are very effective, but search phrases and results are unstructured and that limits a user’s ability to further automate the processing of the search results. In recent years, we have seen efforts to build systems that support more precise query on the web for certain content verticals. We describe the general problems for building an extensible web query system and report some of our work in this area.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Castellanos, M., Chen, Q., Dayal, U., Hsu, M., Lemon, M., Siegel, P., Stinger, J.: Component Advisor: A tool for automatically extracting electronic component data from Web datasheets. In: Proceedings of the Workshop on Reuse of Web-based Information, 7th International World Wide Web Conference (WWW7), Brisbane, Australia (1998)
Nie, Z., Wen, J., Ma, W.: Object-level Vertical Search. In: Proceedings of Conf. on Innovative Data Systems Research, Pacific Grove, California (2007)
Weikum, G.: DB&IR: both sides now. In: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, Beijing, China, pp. 25–30 (2007)
Chakrabarti, S., van den Berg, M., Dom, B.: Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery. Computer Networks 31(11-16), 1623–1640 (1999)
Diligenti, M., Coetzee, F., Lawrence, S., Giles, C.L., Gori, M.: Focused Crawling Using Context Graphs. In: Proceedings of 26th Int. Conf. on Very Large Databases (VLDB), Cairo, Egypt, pp. 527–534 (2000)
Kan, M., Thi, H.: Fast Webpage Classification Using URL Features. In: Proceedings of the 14th Int. Conf., Bremen, Germany (2005)
Dumais, S., Chen, H.: Hierarchical Classification of Web Content. In: Proceedings of the 23rd ACM Int. Conf. on Research and Development in Information Retrieval (SIGIR-2000), Athens, Greece (2000)
Calado, P., Cristo, M., Moura, E., Ziviani, N., Ribeiro-Neto, B., Gonalves, M.: Combining Link-Based and Content-Based Methods for Web Document Classification. In: CIKM 2003. Proceedings or the 12th Int. Conf. on Information and Knowledge Management, New Orleans, Louisiana (2003)
McCallum, A.: Information Extraction: Distilling Structured Data from Unstructured Text. In: ACM QUEUE, pp. 49–57 (November 2005)
Arasu, A., Garcia-Molina, H.: Extracting Structured Data from Web Pages. In: Proceedings of the 2003 ACM SIGMOD Int. Conf., San Diego, California (2003)
Yin, P., Zhang, M., Deng, Z., Yang, D.: Metadata Extraction from Bibliographies Using Bigram HMM. In: Chen, Z., Chen, H., Miao, Q., Fu, Y., Fox, E., Lim, E.-p. (eds.) ICADL 2004. LNCS, vol. 3334, pp. 310–319. Springer, Heidelberg (2004)
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the 18th Int. Conf. on Machine Learning, pp. 282–289. Morgan Kaufmann, San Francisco, CA (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hsu, M., Xiong, Y. (2007). Building a Scalable Web Query System. In: Bhalla, S. (eds) Databases in Networked Information Systems. DNIS 2007. Lecture Notes in Computer Science, vol 4777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75512-8_23
Download citation
DOI: https://doi.org/10.1007/978-3-540-75512-8_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75511-1
Online ISBN: 978-3-540-75512-8
eBook Packages: Computer ScienceComputer Science (R0)