This chapter addresses existing techniques for Web mining, which is moving the World Wide Web toward a more useful environment in which users can quickly and easily find the information they need. In particular, this chapter introduces the reader to methods of data mining on the Web developed by our laboratory, including uncovering patterns in Web content (semantic processing, classification, clustering), structure (retrieval, classical link analysis method), and event (preprocessing of Web event mining, news dynamic trace, multi-document summarization analysis). This chapter would be an excellent resource for students and researchers who are familiar with the basic principles of data mining and want to learn more about the application of data mining to their problems in Web mining.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ando R., Kboguraev B., Kbyrd R. J.: Multi-document Summarization by Visualizing Topical Content.ANLP-NAACL 2000 Workshop, Seattle Advanced Summarization Workshop, 2000: 12-19
Bing Liu: Web data mining. Springer Verlag, 2007
C. Apte, F. Damerau, S. Weiss: Text mining with decision rules and decision trees. In Proceedings of the Conference on Automated Learning and Discovery, Workshop, 1998
David C. Luckham, James Vera: An Event-Based Architecture Definition Language. IEEE TRANSANCTION ON Software Engineering, 1995, 21(9): 717–734
Etzioni, Oren: World-Wide Web: Quagmire or gold mine. Communications of the ACM, 1996, 39(11): 65–68
Evans K., Dklavans J., Lmckeown K. R.: Columbia Newsblaster Multilingual news summarization on the Web.Demonstration Papers at HLT-NAACL, 2004: 1–4
G. DeJong: Prediction and substantiation: A new approach to natural language processing. Cognitive Science, 1979: 251–273
H. Chen, D. T. Ng.: An algorithmic approach to concept exploration in a large knowled-genetwork (automatic thesaurus consultation): symbolic branch-and-bound vs. connection-ist Hopfield net activation. Journal of the American Society for Information Science, 1995, 46(5):348–369
H. Chen, J. Martinez, T. D. Ng, B. R. Schatz: A Concept Space Approach to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System. Journal of the American Society for Information Science, 1997, 48(1): 17–31
J. R. T. Ng, J. Han: Efficient and effective clustering methods for spatial data mining. Proceedings of the 20th VLDB Conference, 1994: 144–155
Jia Ziyan, He Qing, Zhang Hai Jun, Li Jiayou, Shi Zhongzhi: A News Event Detection and Tracking Algorithm Based on Dynamic Evolution Model. Journal of Computer Research and Development (in Chinese), 2004, 41(7): 1273–1280
Jon M. Kleinberg: Authoritative sources in a hyperlinked environment. Journal of the ACM, 1999, 46(5): 604–632
Lin Chin Yew, Hovy Eduard: From Single to Multi-document Summarization: A Prototype System and its Evaluation. In Proceedings of ACL, 2002: 25–34
M. Ester, H. P. Kriegel, J. Sander, X. Xu: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceeding of the 2nd Internatioal Conference on Knowledge Discovery and Data Mining, 1996: 226–231
M. Spiliopoulou: Data mining for the Web. In Proceedings of Principles of Data Mining and Knowledge Discovery. Third European conference, 1999, 588–589
Qing He, Ziyan Jia, Jiayou Li,Haijun Zhang,Qingyong Li, Zhongzhi Shi: GHUNT: A SEMANTIC INDEXING SYSTEM BASED ON CONCEPT SPACE. International Conference on Natural Language Processing and Knowledge Engineering (IEEENLP&KE-2003), 2003: 716–721
Raymond Kosala, Hendrik Blockeel: Web mining research: a survey. ACM SIGKDD Explorations Newsletter, 2000, 2(1): 1–15
R. Cooley: Web Usage Mining: Discovery and Application of Interesting Patterns from Web data. PhD thesis, Dept. of Computer Science, University of Minnesota. May, 2000
Radevr, Jing Hongyan, Budzikowska Malgorzata: Centroid-based summarization of multiple documents Sentence extraction, utility-based evaluationand user studies. ANLP-NAACL 2000 Workshop, 2000: 21–29
S. Lu, X. L. Li, S. Bai et al.: An improved approach to weighting terms in text. Journal of Chinese Information Processing (in Chinese), 2000, 14(6): 8–13
S. K. Madria, S. S. Rhowmich, W. K. Ng, F. P. Lim: Research issues in Web data mining. Proceedings of Data Warehousing and Knowledge Discovery, First International Conference. 1999: 303–312
Sergey Brin, Larry Page: The anatomy of a large-scale hypertextual Web search engine. Proceedings of the Seventh International World Wide Web, 1998, 30(7): 107–117
Shaohui Liu, Mingkai Dong, Haijun Zhang, Rong Li, Zhongzhi Shi: An approach of multi-hierarchy text classification. International Conferences on Info-tech and Info-net. 2001, 3: 95–100
T. Mitchell: Machine Learning. McGraw: Hill, 1996
Teuvo Kohonen, Samuel Kashi: Self-Organization of a Massive Document Collection. IEEE Transactions On Neural Networks, 2000,11(3): 574–585
V. Vapnik: The Nature of Statistical Learning Theory. New York. Springer-Verlag, 1995
Wei Wang, Jiong Yang, Richard Muntz: STING: A Statistical Information Grid Approach to Spatial Data Mining. Proceedings of the 23rd VLDB Conference, 1997: 186–195
Wu Bin, Zheng Yi, Liu Shaohui, Shi Zhongzhi: CSIM: A Document Clustering Algorithm Based On Swarm Intelligence. World Congress on Computational Intelligence, 2002: 477– 482
X. L. Li, J. M. Liu, Z. Z. Shi: The concept-reasoning network and its application in text classification. Journal of Computer Research and Development (in Chinese), 2000, 37(9): 1032–1038
Y. Yang, C. G. Chute: An example-based mapping method for text categorization and retrieval. ACM Transaction on Information Systems (TOIS), 1994, 12(3): 252–277
Y. Yang: Expert Network: Effective and efficient learning from human decisions in text categorization and retrieval. Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval (SIGIR'94), 1994: 13–22
Yuan Li, Qing He, Zhongzhi Shi: Association Retrieval based on concept semantic space. (in Chinese) Journal of University of Science and Technology Beijing, 2001, 23(6): 577–580
Zhongzhi Shi, Qing He, Ziyan Jia, Jiayou Li: Intelligence Chinese Document Semantic Indexing System. International Journal of Information Technology and Decision Making, 2003, 2(3): 407–424
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Shi, Z., Ma, H., He, Q. (2009). Web Mining: Extracting Knowledge from the World Wide Web. In: Cao, L., Yu, P.S., Zhang, C., Zhang, H. (eds) Data Mining for Business Applications. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-79420-4_14
Download citation
DOI: https://doi.org/10.1007/978-0-387-79420-4_14
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-79419-8
Online ISBN: 978-0-387-79420-4
eBook Packages: Computer ScienceComputer Science (R0)