Skip to main content

Information Extraction from Heterogeneous Web Sites Using Clue Complement Process Based on a User’s Instantiated Example

  • Conference paper
Distributed Computing and Artificial Intelligence

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 79))

Abstract

Since the growth of the Internet,World Wide Web has become significant infrastructure in various fields such as business, commerce, education and so on. Accordingly, a user has gathered information by using the Internet. However due to increasing Web pages, it becomes difficult for a user to collect desirable information. Advanced Web search engines may provide solution to some extent, it is still up to a user to summarize or extract meaningful information from such retrieval results. Based on this viewpoints, this paper addresses a generation method of table-style data from heterogeneous Web pages that reflects a user’s intention. To achieve it, the method utilize a user’s instantiated example in a table in addition to column labels as the table. Based on a user’s instantiated example, meaningful information are extracted using pattern matching and N-gram method. We apply this method to 57 pages with 27 travel agencies whether the proposed method is effective or not. As the result, 88% was precision rate and 68% was recall rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 469.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 599.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Okamura, H., Miyauchi, S., Dohi, T.: A Web Page Ranking Algorithm Based on a Markov Decision Process. The IEICE transitions on information and systems(Japanese edition) J89-D(2), 210–219 (2006)

    Google Scholar 

  2. Aratani, H., Fujita, S., Sugawara, K.: Extremely Precise Finding Methodology on the Mutual Evolution Method Among Web Pages, IEICE technical report, Artificial Intelligence and Knowledge-based Processing 105(105), 1–6 (May 2005)

    Google Scholar 

  3. Kawamae, N., Aoki, T., Yasuda, H.: Page Ranking Method of Search System Considering Difference of Access to the Pages. In: Proc. of the IEICE General Conference, vol. (1), p. 47 (March 2000)

    Google Scholar 

  4. Watanabe, N., Okamoto, M., Kikuchi, M., Iida, T., Hattori, M.: Influence of Presentation Style in Web-Search Result Recommenndation, IPSJ SIG technical reports 2009 (28), 61-68 (March 2009)

    Google Scholar 

  5. Toda, H., Yasuda, N., Okumura, M., Matsuura, Y., Kataoka, R.: Snippet Generation for Geographic Information Retrieval. Transition of the Japanese Society for Artificial Intelligence 24(6), 494–506 (2009)

    Article  Google Scholar 

  6. Muramatsu, R., Yokoyama, S., Fukuta, N., Ishikawa, H.: Architect Snippets with Harmonized Various Viewpoint about Search Result Cluster with Consideration of Word’s Characteristic Volume. SIG Notes 2008 88, 301–306 (2008)

    Google Scholar 

  7. Sakai, H., Masuyama, S.: A Multiple-Document Summarization System Introducing User Interaction for Reflecting User’s Summarization Needs. Journal of Japan Society for Fuzzy Theory and Intelligence Informatics 18(2), 265–279 (2006)

    Article  Google Scholar 

  8. Aratani, H., Fujita, S., Sugawara, K.: Improvement of a Re-ranking Method for Web Search Based on Mutual Evaliation among Web Pages. Journal of Japan Society for Fuzzy Theory and Intelligent Informatics 18(2), 196–212 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shimada, J., Oka, H., Akiyoshi, M., Komoda, N. (2010). Information Extraction from Heterogeneous Web Sites Using Clue Complement Process Based on a User’s Instantiated Example. In: de Leon F. de Carvalho, A.P., Rodríguez-González, S., De Paz Santana, J.F., Rodríguez, J.M.C. (eds) Distributed Computing and Artificial Intelligence. Advances in Intelligent and Soft Computing, vol 79. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14883-5_74

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14883-5_74

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14882-8

  • Online ISBN: 978-3-642-14883-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics