Information Extraction from Heterogeneous Web Sites Using Clue Complement Process Based on a User’s Instantiated Example

Shimada, Junya; Oka, Hironori; Akiyoshi, Masanori; Komoda, Norihisa

doi:10.1007/978-3-642-14883-5_74

Junya Shimada⁵,
Hironori Oka⁶,
Masanori Akiyoshi⁵ &
…
Norihisa Komoda⁵

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 79))

1367 Accesses
1 Citations

Abstract

Since the growth of the Internet,World Wide Web has become significant infrastructure in various fields such as business, commerce, education and so on. Accordingly, a user has gathered information by using the Internet. However due to increasing Web pages, it becomes difficult for a user to collect desirable information. Advanced Web search engines may provide solution to some extent, it is still up to a user to summarize or extract meaningful information from such retrieval results. Based on this viewpoints, this paper addresses a generation method of table-style data from heterogeneous Web pages that reflects a user’s intention. To achieve it, the method utilize a user’s instantiated example in a table in addition to column labels as the table. Based on a user’s instantiated example, meaningful information are extracted using pattern matching and N-gram method. We apply this method to 57 pages with 27 travel agencies whether the proposed method is effective or not. As the result, 88% was precision rate and 68% was recall rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 469.00; Price excludes VAT (USA)

Softcover Book: USD 599.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Okamura, H., Miyauchi, S., Dohi, T.: A Web Page Ranking Algorithm Based on a Markov Decision Process. The IEICE transitions on information and systems(Japanese edition) J89-D(2), 210–219 (2006)
Google Scholar
Aratani, H., Fujita, S., Sugawara, K.: Extremely Precise Finding Methodology on the Mutual Evolution Method Among Web Pages, IEICE technical report, Artificial Intelligence and Knowledge-based Processing 105(105), 1–6 (May 2005)
Google Scholar
Kawamae, N., Aoki, T., Yasuda, H.: Page Ranking Method of Search System Considering Difference of Access to the Pages. In: Proc. of the IEICE General Conference, vol. (1), p. 47 (March 2000)
Google Scholar
Watanabe, N., Okamoto, M., Kikuchi, M., Iida, T., Hattori, M.: Influence of Presentation Style in Web-Search Result Recommenndation, IPSJ SIG technical reports 2009 (28), 61-68 (March 2009)
Google Scholar
Toda, H., Yasuda, N., Okumura, M., Matsuura, Y., Kataoka, R.: Snippet Generation for Geographic Information Retrieval. Transition of the Japanese Society for Artificial Intelligence 24(6), 494–506 (2009)
Article Google Scholar
Muramatsu, R., Yokoyama, S., Fukuta, N., Ishikawa, H.: Architect Snippets with Harmonized Various Viewpoint about Search Result Cluster with Consideration of Word’s Characteristic Volume. SIG Notes 2008 88, 301–306 (2008)
Google Scholar
Sakai, H., Masuyama, S.: A Multiple-Document Summarization System Introducing User Interaction for Reflecting User’s Summarization Needs. Journal of Japan Society for Fuzzy Theory and Intelligence Informatics 18(2), 265–279 (2006)
Article Google Scholar
Aratani, H., Fujita, S., Sugawara, K.: Improvement of a Re-ranking Method for Web Search Based on Mutual Evaliation among Web Pages. Journal of Japan Society for Fuzzy Theory and Intelligent Informatics 18(2), 196–212 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Osaka University, 2-1 Yamadaoka, Suita, Osaka, Japan
Junya Shimada, Masanori Akiyoshi & Norihisa Komoda
Codetoys, 2-6-8 Nishitenma, Kita, Osaka, Japan
Hironori Oka

Authors

Junya Shimada
View author publications
You can also search for this author in PubMed Google Scholar
Hironori Oka
View author publications
You can also search for this author in PubMed Google Scholar
Masanori Akiyoshi
View author publications
You can also search for this author in PubMed Google Scholar
Norihisa Komoda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Sao Paulo at Sao Carlos, Sao Carlos, SP, Brazil
Andre Ponce de Leon F. de Carvalho
Department of Computing, Science and Control, Faculty of Science, University of Salamanca, Plaza de la Merced S/N, 37008, Salamanca, Spain
Sara Rodríguez-González
Department of Computing, Science and Control, Faculty of Science, University of Salamanca, Plaza de la Merced S/N, 37008, Salamanca, Spain
Juan F. De Paz Santana & Juan M. Corchado Rodríguez &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shimada, J., Oka, H., Akiyoshi, M., Komoda, N. (2010). Information Extraction from Heterogeneous Web Sites Using Clue Complement Process Based on a User’s Instantiated Example. In: de Leon F. de Carvalho, A.P., Rodríguez-González, S., De Paz Santana, J.F., Rodríguez, J.M.C. (eds) Distributed Computing and Artificial Intelligence. Advances in Intelligent and Soft Computing, vol 79. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14883-5_74

Download citation

DOI: https://doi.org/10.1007/978-3-642-14883-5_74
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14882-8
Online ISBN: 978-3-642-14883-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics