Abstract
Compared with the text information text, hypertext information such as hyperlinks and meta data all provide rich information for classifying hypertext documents. After analyzing different rules of using hypertext, we present a new hypertext classification algorithm based on co-weighting multi-information. We co-operate different hypertext information generally, by co-weighting them after extraction. Experimental results on two different data sets show that the new algorithm performs better than using single hypertext information individually.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chen, H.: Bringing order to the web: Automatically categorizing search results. In: Proceeding of CHI 2000, Human Factors in Computing Systems (2000)
Joachims, T.: Composite kernals for hypertext categorization. In: International Conference on Machine Learning (ICML 2001), Morgan Kaufmann, San Francisco (2001)
Joachims, T.: A probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Machine Learning. In: Proceedings of the Fourteenth International Conference, pp. 143–151 (1997)
McCallum, A.: A Comparison of Event Model for Navie Bayes Text Classification. In: AAAI 1998 Workshop on Learning for Text Categorization of the Fifteenth International Conference (ICML 1998), pp. 59–367 (1998)
McCallum: Bow: A toolkit for statistical language modeling,text retrieval, classification and clustering, http://www.cs.cmu.edu/~mccallum/bow (1996)
Chakrabarti, S.: Enhanced hypertext categorization using hyperlinks. In: Proceedings ACM SIGMOD International Conference on Management of Data, Seattle, Washington, June 1998, pp. 307–318. ACM Press, New York (1998)
Yang, Y., Slattery, S., Ghani, R.: A study of approaches to hypertext categorization. Journal of Intelligent Information Systems 18(2/3), 219–241 (2002)
Yang, Y.: An example-based mapping method for text classification and retrieval. ACM Transactions on Information Systems 23(3), 252–277 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Peng, Y., Lin, Yp., Chen, Zp. (2004). Hypertext Classification Algorithm Based on Co-weighting Multi-information. In: Li, Q., Wang, G., Feng, L. (eds) Advances in Web-Age Information Management. WAIM 2004. Lecture Notes in Computer Science, vol 3129. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27772-9_72
Download citation
DOI: https://doi.org/10.1007/978-3-540-27772-9_72
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22418-1
Online ISBN: 978-3-540-27772-9
eBook Packages: Springer Book Archive