Skip to main content

Domain-Independent Classification for Deep Web Interfaces

  • Conference paper
Web-Age Information Management (WAIM 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6184))

Included in the following conference series:

Abstract

The data sources of Deep Web provide tremendous structured data with high quality. However, classifying these interfaces with domain independent is required since the domains of the huge scale of deep web are hard to predefine. In this paper, we propose a novel approach with three-stage to solve this problem. First, we extract both texts and structure of a query interface by applying FIE algorithm we proposed. Then we construct frequent itemsets by using frequent pattern mining algorithm. Finally, we apply AP clustering algorithm to cluster the frequent itemsets according to similarity measure FGSTD presented in this paper. The experiment demonstrates our approach clusters interfaces well with domain independent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Senellart, P., Mittal, A., et al.: Automatic wrapper induction from hidden-web sources with domain knowledge. In: 10th ACM workshop on Web information and data management, pp. 9–16. ACM Press, Napa Valley (2008)

    Chapter  Google Scholar 

  2. He, B., Tao, T., Chang, K.C.-C.: Organizing Structured Web Sources by Query Schemas: A Clustering Approach. In: 13th ACM international conference Information and knowledge management, Washington, DC, USA, pp. 22–31(2004)

    Google Scholar 

  3. Barbosa, L., Freire, J.: Combining Classifiers to Identify Online Databases. In: 16th international conference WWW, pp. 431–440. ACM Press, Banff (2007)

    Google Scholar 

  4. Ngu, A.H., Rocco, D., et al.: Automatic Discovery and Inferencing of Complex Bioinformatics Web Interfaces. In: WWW, Chiba, Japan, pp. 463–493 (2005)

    Google Scholar 

  5. Lu, Y., He, H., et al.: Clustering e-commerce search engines based on their search interface pages using WISE-Cluster. J. Data & Knowledge Engineering 59, 231–246 (2006)

    Article  Google Scholar 

  6. Chang, K., He, B., Zhang, Z.: Metaquerier over the deepweb: Shallow integration across holistic sources. In: Proceedings of the VLDB Workshop on Information Integration on the Web Toronto, Canada (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, Y., Wang, S., Shen, D., Nie, T., Yu, G. (2010). Domain-Independent Classification for Deep Web Interfaces. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14246-8_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14245-1

  • Online ISBN: 978-3-642-14246-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics