Abstract
The data sources of Deep Web provide tremendous structured data with high quality. However, classifying these interfaces with domain independent is required since the domains of the huge scale of deep web are hard to predefine. In this paper, we propose a novel approach with three-stage to solve this problem. First, we extract both texts and structure of a query interface by applying FIE algorithm we proposed. Then we construct frequent itemsets by using frequent pattern mining algorithm. Finally, we apply AP clustering algorithm to cluster the frequent itemsets according to similarity measure FGSTD presented in this paper. The experiment demonstrates our approach clusters interfaces well with domain independent.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Senellart, P., Mittal, A., et al.: Automatic wrapper induction from hidden-web sources with domain knowledge. In: 10th ACM workshop on Web information and data management, pp. 9–16. ACM Press, Napa Valley (2008)
He, B., Tao, T., Chang, K.C.-C.: Organizing Structured Web Sources by Query Schemas: A Clustering Approach. In: 13th ACM international conference Information and knowledge management, Washington, DC, USA, pp. 22–31(2004)
Barbosa, L., Freire, J.: Combining Classifiers to Identify Online Databases. In: 16th international conference WWW, pp. 431–440. ACM Press, Banff (2007)
Ngu, A.H., Rocco, D., et al.: Automatic Discovery and Inferencing of Complex Bioinformatics Web Interfaces. In: WWW, Chiba, Japan, pp. 463–493 (2005)
Lu, Y., He, H., et al.: Clustering e-commerce search engines based on their search interface pages using WISE-Cluster. J. Data & Knowledge Engineering 59, 231–246 (2006)
Chang, K., He, B., Zhang, Z.: Metaquerier over the deepweb: Shallow integration across holistic sources. In: Proceedings of the VLDB Workshop on Information Integration on the Web Toronto, Canada (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, Y., Wang, S., Shen, D., Nie, T., Yu, G. (2010). Domain-Independent Classification for Deep Web Interfaces. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_44
Download citation
DOI: https://doi.org/10.1007/978-3-642-14246-8_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14245-1
Online ISBN: 978-3-642-14246-8
eBook Packages: Computer ScienceComputer Science (R0)