Abstract
This paper presents results of a work on crawling CEUR Workshop proceedings(CEUR Workshop proceedings web site, URL: http://ceur-ws.org) web site to a Linked Open Data (LOD) dataset in the framework of ESWC 2014 Semantic Publishing Challenge 2014(ESWC 2014 Semantic Publishing Challenge, URL: http://2014.eswc-conferences.org/semantic-publishing-challenge). Our approach is based on using an extensible template-dependent crawler and DBpedia for linking extracted entities, such as the names of universities and countries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
ESWC 2014 Semantic Publishing Challenge, URL: http://2014.eswc-conferences.org/semantic-publishing-challenge.
- 2.
CEUR Workshop proceedings web site, URL: http://ceur-ws.org
- 3.
The source code and instructions, URL: https://github.com/ailabitmo/sempub challenge2014-task1.
- 4.
Grab framework, URL: http://grablib.org/.
- 5.
Semantic Web Conference Ontology, URL: http://data.semanticweb.org/ns/swc/ontology.
- 6.
Semantic Web for Research Communities, URL: http://ontoware.org/swrc/.
- 7.
The Bibliographic Ontology, URL: http://purl.org/ontology/bibo/.
- 8.
The Timeline Ontology, URL: http://purl.org/NET/c4dm/timeline.owl#.
- 9.
The Friend of a Friend (FOAF), URL: http://www.foaf-project.org/.
- 10.
Dublin Core, URL: http://purl.org/dc/elements/1.1/.
- 11.
DBpedia Ontology, URL: http://dbpedia.org/ontology/.
- 12.
RDF Schema, URL: http://www.w3.org/2000/01/rdf-schema#.
- 13.
PDFMiiner, URL: http://www.unixuser.org/~euske/python/pdfminer/.
- 14.
DBLP, URL: http://www.informatik.uni-trier.de/~ley/db/.
- 15.
Semantic Web Dog Food, URL: http://data.semanticweb.org/.
References
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Seman. Web J. (2014). http://www.semantic-web-journal.net/content/dbpedia-large-scale-multilingual-knowledge-base-extracted-wikipedia-0
Ratcliff, J.W., Metzener, D.E.: Pattern-matching-the gestalt approach. Dr DOBBS J. (DDJ) 13(7), 1–46 (1988)
Acknowledgments
This work has been partially financially supported by the Government of Russian Federation, Grant #074-U01.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Kolchin, M., Kozlov, F. (2014). A Template-Based Information Extraction from Web Sites with Unstable Markup. In: Presutti, V., et al. Semantic Web Evaluation Challenge. SemWebEval 2014. Communications in Computer and Information Science, vol 475. Springer, Cham. https://doi.org/10.1007/978-3-319-12024-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-12024-9_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12023-2
Online ISBN: 978-3-319-12024-9
eBook Packages: Computer ScienceComputer Science (R0)