Domain-Independent Classification for Deep Web Interfaces

Li, Yingjun; Wang, Siwei; Shen, Derong; Nie, Tiezheng; Yu, Ge

doi:10.1007/978-3-642-14246-8_44

Yingjun Li²⁰,
Siwei Wang²⁰,
Derong Shen²⁰,
Tiezheng Nie²⁰ &
…
Ge Yu²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6184))

Included in the following conference series:

International Conference on Web-Age Information Management

1663 Accesses
1 Citations

Abstract

The data sources of Deep Web provide tremendous structured data with high quality. However, classifying these interfaces with domain independent is required since the domains of the huge scale of deep web are hard to predefine. In this paper, we propose a novel approach with three-stage to solve this problem. First, we extract both texts and structure of a query interface by applying FIE algorithm we proposed. Then we construct frequent itemsets by using frequent pattern mining algorithm. Finally, we apply AP clustering algorithm to cluster the frequent itemsets according to similarity measure FGSTD presented in this paper. The experiment demonstrates our approach clusters interfaces well with domain independent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Senellart, P., Mittal, A., et al.: Automatic wrapper induction from hidden-web sources with domain knowledge. In: 10th ACM workshop on Web information and data management, pp. 9–16. ACM Press, Napa Valley (2008)
Chapter Google Scholar
He, B., Tao, T., Chang, K.C.-C.: Organizing Structured Web Sources by Query Schemas: A Clustering Approach. In: 13th ACM international conference Information and knowledge management, Washington, DC, USA, pp. 22–31(2004)
Google Scholar
Barbosa, L., Freire, J.: Combining Classifiers to Identify Online Databases. In: 16th international conference WWW, pp. 431–440. ACM Press, Banff (2007)
Google Scholar
Ngu, A.H., Rocco, D., et al.: Automatic Discovery and Inferencing of Complex Bioinformatics Web Interfaces. In: WWW, Chiba, Japan, pp. 463–493 (2005)
Google Scholar
Lu, Y., He, H., et al.: Clustering e-commerce search engines based on their search interface pages using WISE-Cluster. J. Data & Knowledge Engineering 59, 231–246 (2006)
Article Google Scholar
Chang, K., He, B., Zhang, Z.: Metaquerier over the deepweb: Shallow integration across holistic sources. In: Proceedings of the VLDB Workshop on Information Integration on the Web Toronto, Canada (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Science and Engineering, Northeastern University, 110004, Shenyang, China
Yingjun Li, Siwei Wang, Derong Shen, Tiezheng Nie & Ge Yu

Authors

Yingjun Li
View author publications
You can also search for this author in PubMed Google Scholar
Siwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Derong Shen
View author publications
You can also search for this author in PubMed Google Scholar
Tiezheng Nie
View author publications
You can also search for this author in PubMed Google Scholar
Ge Yu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
Lei Chen
Computer Department, Sichuan University, 610064, Chengdu, China
Changjie Tang
Department of Computer Science, Duke University, Box 90129, NC 27708-0129, Durham, USA
Jun Yang
College of Computer Science, Zhejiang University, 388 Yuhangtang Road, 310058, Hangzhou, China
Yunjun Gao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Wang, S., Shen, D., Nie, T., Yu, G. (2010). Domain-Independent Classification for Deep Web Interfaces. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_44

Download citation

DOI: https://doi.org/10.1007/978-3-642-14246-8_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14245-1
Online ISBN: 978-3-642-14246-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics