Skip to main content

A Semantic Crawler Based on an Extended CBR Algorithm

  • Conference paper
On the Move to Meaningful Internet Systems: OTM 2008 Workshops (OTM 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5333))

Abstract

A semantic (web) crawler refers to a series of web crawlers designed for harvesting semantic web content. This paper presents the framework of a semantic crawler that can abstract metadata from online webpages and cluster the metadata by associating them with ontological concepts. The clustering is based on a CBR algorithm which is adopted in the field of problem solving. We reveal the technical details with regard to ontological concept and metadata format, and the extended CBR algorithm. In addition, the system implementation and evaluation details are provided in detail, finalized by our conclusion and further works.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Carthy, D.C.J., Drummond, A., Dunnion, J., Sheppard, J.: The use of data mining in the design and implementation of an incident report retrieval system. In: Systems and Information Engineering Design Symposium, pp. 13–18. IEEE, Charlottesville (2003)

    Google Scholar 

  2. Decker, S., Erdmann, M., Fensel, D., Studer, R.: Ontobroker: Ontology based access to distributed and semi-structured Information. In: Meersman, R. (ed.) Database Semantics: Semantic Issues in Multimedia Systems, pp. 351–369. Kluwer Academic Publisher, Dordrecht (1999)

    Chapter  Google Scholar 

  3. Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi, V.C., Sachs, J.: Swoogle: a search and metadata engine for the semantic web. In: The Thirteenth ACM Conference on Information and Knowledge Management. ACM Press, Washington (2004)

    Google Scholar 

  4. Dodds, L.: Slug: a semantic web crawler (2006)

    Google Scholar 

  5. Dong, H., Hussain, F.K., Chang, E.: State of the art in metadata abstraction crawlers. In: 2008 IEEE International Conference on Industrial Technology (IEEE ICIT 2008). IEEE, Chengdu (2008)

    Google Scholar 

  6. Handschuh, S., Staab, S.: Authoring and annotation of web pages in CREAM. In: WWW 2002, pp. 462–473. ACM Press, Honolulu (2002)

    Google Scholar 

  7. Handschuh, S., Staab, S.: CREAM: CREAting Metadata for the Semantic Web. Computer Networks 42, 579–598 (2003)

    Article  MATH  Google Scholar 

  8. Handschuh, S., Staab, S., Maedche, A.: CREAM — Creating relational metadata with a component-based, ontology-driven annotation framework. In: K-CAP 2001, pp. 76–83. ACM Press, Victoria (2001)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dong, H., Hussain, F.K., Chang, E. (2008). A Semantic Crawler Based on an Extended CBR Algorithm. In: Meersman, R., Tari, Z., Herrero, P. (eds) On the Move to Meaningful Internet Systems: OTM 2008 Workshops. OTM 2008. Lecture Notes in Computer Science, vol 5333. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88875-8_135

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88875-8_135

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88874-1

  • Online ISBN: 978-3-540-88875-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics