Skip to main content

Block Based Web Page Feature Selection with Neural Network

  • Conference paper
Advances in Computer Science, Environment, Ecoinformatics, and Education (CSEE 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 215))

  • 1805 Accesses

Abstract

To extract the feature of web page accurately is one of the basic topics of Web Data Mining. Considering the structure of web page, a block based feature selection method was imported in this article. A neural network could be used to recognize the priorities of different web page block and then the VPDom tree was built up. The experiment proves that Block Based Feature Selection could filter the “noisy” and enhance the main content of the web page.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proc. of the 14th International Conference on Machine Learning, ICML 1997, pp. 412–420 (1997)

    Google Scholar 

  2. Wang, Q., Tang, S.W.: DOM-Based Automatic Extraction of Topical Information from Web Pages. Journal of Computer Research and Development 41, 1786–1792 (2004)

    Google Scholar 

  3. Embley, D., Jiang, S., Ng, Y.-K.: Record-boundary discovery in Web documents. In: Proc. 1999 ACM SIGMOD International Conference on Management of Data, pp. 467–478 (1999)

    Google Scholar 

  4. Chen, J., Zhou, B., Shi, J., Zhang, H.-J., Qiu, F.: Function-Based Object Model Towards Website Adaptation. In: The Proceedings of the 10th World Wide Web Conference (WWW 2010), Budapest, Hungary, pp. 587–596 (May 2001)

    Google Scholar 

  5. Kovacevic, M., Diligenti, M., Gori, M., Milutinovic, V.: Recognition of Common Areas in a Web Page Using Visual Information: a possible application in a page classification. In: The Proceedings of 2002 IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, pp. 1345–1355 (2002)

    Google Scholar 

  6. Yu, S., Cai, D., Wen, J.-R., Ma, W.-Y.: Improving Pseudo-Relevance Feedback in Web Information retrieval Using Web Page Segmentation. In: The Proceedings of Twelfth World Wide Web Conference (WWW 2003), Budapest, Hungary, pp. 11–18 (2003)

    Google Scholar 

  7. Michael, T.M.: Machine Learning, pp. 60–72. McGraw-Hill, New York (1997)

    Google Scholar 

  8. Hiemstra, D.: A probabilistic justification for using tf.idf term weighting in information retrieval. International Journal on Digital Libraries 3(2), 131–139 (2000)

    Article  Google Scholar 

  9. Liu, T.Y., Yang, Y., Wan, H., Zeng, H.J., Chen, Z., Ma, W.Y.: Support vector machines classification with a very large-scale taxonomy. SIGKDD Explor. Newsl (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jin, Y., Liu, R., He, X., Huang, Y. (2011). Block Based Web Page Feature Selection with Neural Network. In: Lin, S., Huang, X. (eds) Advances in Computer Science, Environment, Ecoinformatics, and Education. CSEE 2011. Communications in Computer and Information Science, vol 215. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23324-1_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23324-1_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23323-4

  • Online ISBN: 978-3-642-23324-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics