Skip to main content

Web Mining: Extracting Knowledge from the World Wide Web

  • Chapter
Data Mining for Business Applications

This chapter addresses existing techniques for Web mining, which is moving the World Wide Web toward a more useful environment in which users can quickly and easily find the information they need. In particular, this chapter introduces the reader to methods of data mining on the Web developed by our laboratory, including uncovering patterns in Web content (semantic processing, classification, clustering), structure (retrieval, classical link analysis method), and event (preprocessing of Web event mining, news dynamic trace, multi-document summarization analysis). This chapter would be an excellent resource for students and researchers who are familiar with the basic principles of data mining and want to learn more about the application of data mining to their problems in Web mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ando R., Kboguraev B., Kbyrd R. J.: Multi-document Summarization by Visualizing Topical Content.ANLP-NAACL 2000 Workshop, Seattle Advanced Summarization Workshop, 2000: 12-19

    Google Scholar 

  2. Bing Liu: Web data mining. Springer Verlag, 2007

    Google Scholar 

  3. C. Apte, F. Damerau, S. Weiss: Text mining with decision rules and decision trees. In Proceedings of the Conference on Automated Learning and Discovery, Workshop, 1998

    Google Scholar 

  4. David C. Luckham, James Vera: An Event-Based Architecture Definition Language. IEEE TRANSANCTION ON Software Engineering, 1995, 21(9): 717–734

    Article  Google Scholar 

  5. Etzioni, Oren: World-Wide Web: Quagmire or gold mine. Communications of the ACM, 1996, 39(11): 65–68

    Article  Google Scholar 

  6. Evans K., Dklavans J., Lmckeown K. R.: Columbia Newsblaster Multilingual news summarization on the Web.Demonstration Papers at HLT-NAACL, 2004: 1–4

    Google Scholar 

  7. G. DeJong: Prediction and substantiation: A new approach to natural language processing. Cognitive Science, 1979: 251–273

    Google Scholar 

  8. H. Chen, D. T. Ng.: An algorithmic approach to concept exploration in a large knowled-genetwork (automatic thesaurus consultation): symbolic branch-and-bound vs. connection-ist Hopfield net activation. Journal of the American Society for Information Science, 1995, 46(5):348–369

    Article  Google Scholar 

  9. H. Chen, J. Martinez, T. D. Ng, B. R. Schatz: A Concept Space Approach to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System. Journal of the American Society for Information Science, 1997, 48(1): 17–31

    Article  Google Scholar 

  10. J. R. T. Ng, J. Han: Efficient and effective clustering methods for spatial data mining. Proceedings of the 20th VLDB Conference, 1994: 144–155

    Google Scholar 

  11. Jia Ziyan, He Qing, Zhang Hai Jun, Li Jiayou, Shi Zhongzhi: A News Event Detection and Tracking Algorithm Based on Dynamic Evolution Model. Journal of Computer Research and Development (in Chinese), 2004, 41(7): 1273–1280

    Google Scholar 

  12. Jon M. Kleinberg: Authoritative sources in a hyperlinked environment. Journal of the ACM, 1999, 46(5): 604–632

    Article  MATH  MathSciNet  Google Scholar 

  13. Lin Chin Yew, Hovy Eduard: From Single to Multi-document Summarization: A Prototype System and its Evaluation. In Proceedings of ACL, 2002: 25–34

    Google Scholar 

  14. M. Ester, H. P. Kriegel, J. Sander, X. Xu: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceeding of the 2nd Internatioal Conference on Knowledge Discovery and Data Mining, 1996: 226–231

    Google Scholar 

  15. M. Spiliopoulou: Data mining for the Web. In Proceedings of Principles of Data Mining and Knowledge Discovery. Third European conference, 1999, 588–589

    Google Scholar 

  16. Qing He, Ziyan Jia, Jiayou Li,Haijun Zhang,Qingyong Li, Zhongzhi Shi: GHUNT: A SEMANTIC INDEXING SYSTEM BASED ON CONCEPT SPACE. International Conference on Natural Language Processing and Knowledge Engineering (IEEENLP&KE-2003), 2003: 716–721

    Google Scholar 

  17. Raymond Kosala, Hendrik Blockeel: Web mining research: a survey. ACM SIGKDD Explorations Newsletter, 2000, 2(1): 1–15

    Article  Google Scholar 

  18. R. Cooley: Web Usage Mining: Discovery and Application of Interesting Patterns from Web data. PhD thesis, Dept. of Computer Science, University of Minnesota. May, 2000

    Google Scholar 

  19. Radevr, Jing Hongyan, Budzikowska Malgorzata: Centroid-based summarization of multiple documents Sentence extraction, utility-based evaluationand user studies. ANLP-NAACL 2000 Workshop, 2000: 21–29

    Google Scholar 

  20. S. Lu, X. L. Li, S. Bai et al.: An improved approach to weighting terms in text. Journal of Chinese Information Processing (in Chinese), 2000, 14(6): 8–13

    MATH  Google Scholar 

  21. S. K. Madria, S. S. Rhowmich, W. K. Ng, F. P. Lim: Research issues in Web data mining. Proceedings of Data Warehousing and Knowledge Discovery, First International Conference. 1999: 303–312

    Google Scholar 

  22. Sergey Brin, Larry Page: The anatomy of a large-scale hypertextual Web search engine. Proceedings of the Seventh International World Wide Web, 1998, 30(7): 107–117

    Google Scholar 

  23. Shaohui Liu, Mingkai Dong, Haijun Zhang, Rong Li, Zhongzhi Shi: An approach of multi-hierarchy text classification. International Conferences on Info-tech and Info-net. 2001, 3: 95–100

    Google Scholar 

  24. T. Mitchell: Machine Learning. McGraw: Hill, 1996

    MATH  Google Scholar 

  25. Teuvo Kohonen, Samuel Kashi: Self-Organization of a Massive Document Collection. IEEE Transactions On Neural Networks, 2000,11(3): 574–585

    Article  Google Scholar 

  26. V. Vapnik: The Nature of Statistical Learning Theory. New York. Springer-Verlag, 1995

    MATH  Google Scholar 

  27. Wei Wang, Jiong Yang, Richard Muntz: STING: A Statistical Information Grid Approach to Spatial Data Mining. Proceedings of the 23rd VLDB Conference, 1997: 186–195

    Google Scholar 

  28. Wu Bin, Zheng Yi, Liu Shaohui, Shi Zhongzhi: CSIM: A Document Clustering Algorithm Based On Swarm Intelligence. World Congress on Computational Intelligence, 2002: 477– 482

    Google Scholar 

  29. www.keenage.com

  30. X. L. Li, J. M. Liu, Z. Z. Shi: The concept-reasoning network and its application in text classification. Journal of Computer Research and Development (in Chinese), 2000, 37(9): 1032–1038

    Google Scholar 

  31. Y. Yang, C. G. Chute: An example-based mapping method for text categorization and retrieval. ACM Transaction on Information Systems (TOIS), 1994, 12(3): 252–277

    Article  Google Scholar 

  32. Y. Yang: Expert Network: Effective and efficient learning from human decisions in text categorization and retrieval. Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval (SIGIR'94), 1994: 13–22

    Google Scholar 

  33. Yuan Li, Qing He, Zhongzhi Shi: Association Retrieval based on concept semantic space. (in Chinese) Journal of University of Science and Technology Beijing, 2001, 23(6): 577–580

    Google Scholar 

  34. Zhongzhi Shi, Qing He, Ziyan Jia, Jiayou Li: Intelligence Chinese Document Semantic Indexing System. International Journal of Information Technology and Decision Making, 2003, 2(3): 407–424

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhongzhi Shi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Shi, Z., Ma, H., He, Q. (2009). Web Mining: Extracting Knowledge from the World Wide Web. In: Cao, L., Yu, P.S., Zhang, C., Zhang, H. (eds) Data Mining for Business Applications. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-79420-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-79420-4_14

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-79419-8

  • Online ISBN: 978-0-387-79420-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics