Semantic Recognition of Digital Documents

Betliński, Paweł; Gora, Paweł; Herba, Kamil; Nguyen, Trung Tuan; Stawicki, Sebastian

doi:10.1007/978-3-642-24809-2_7

Paweł Betliński⁵,
Paweł Gora⁵,
Kamil Herba⁵,
Trung Tuan Nguyen⁵ &
…
Sebastian Stawicki⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 390))

737 Accesses
1 Citations

Abstract

The paper presents methods developed by the Methods of Semantic Recognition of Scientific Documents group in the research within the scope of the SYNAT project. It describes document representation format together with a proof of concept system converting scientific articles in PDF format into this representation. Another topic presented in the article is an experiment with clustering documents by style.

The authors are supported by the grant N N516 077837 from the Ministry of Science and Higher Education of the Republic of Poland and by the National Centre for Research and Development (NCBiR) under Grant No. SP/I/1/77065/10 by the strategic scientific research and experimental development program: “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Consortium BazTech: BazTech - Database of the Polish Technical Journal Contents (2011), http://baztech.icm.edu.pl/
The DBPedia Community: The DBPedia Knowledge Base (2011), http://DBpedia.org
PubMed Central, http://www.ncbi.nlm.nih.gov/pmc/
S. Hoa Nguyen, Świeboda, W., Jaśkiewicz, G.: Extended document representation for search result clustering. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Niezgódka, M. (eds.) To be published in: Intelligent Tools for Building a Scientific Information Platform (2011)
Google Scholar
Mulberry Technologies, Inc.: Journal Archiving and Interchange Tag Set Tag Library version 3.0 (2008), http://dtd.nlm.nih.gov/archiving/tag-library
Shinyama, Y.: PDFMiner: Python PDF parser and analyzer (2010), http://www.unixuser.org/~euske/python/pdfminer/
Szczuka, M., Janusz, A., Herba, K.: Clustering of rough set related documents with use of knowledge from dBpedia. In: Yao, J. (ed.) RSKT 2011. LNCS, vol. 6954, pp. 394–403. Springer, Heidelberg (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Mathematics, Computer Science and Mechanics, University of Warsaw, Banacha 2, 02-097, Warsaw, Poland
Paweł Betliński, Paweł Gora, Kamil Herba, Trung Tuan Nguyen & Sebastian Stawicki

Authors

Paweł Betliński
View author publications
You can also search for this author in PubMed Google Scholar
Paweł Gora
View author publications
You can also search for this author in PubMed Google Scholar
Kamil Herba
View author publications
You can also search for this author in PubMed Google Scholar
Trung Tuan Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Stawicki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paweł Betliński .

Editor information

Editors and Affiliations

Institute of Computer Science, Faculty of Electronics and Information, Warsaw University of Technology, Ul. Nowowiejska 15/19, Warsaw, 00-665, Poland
Robert Bembenik
Institute of Computer Science, Faculty of Electronics and Information, Warsaw University of Technology, Ul. Nowowiejska 15/19, Warsaw, 00-665, Poland
Lukasz Skonieczny
Institute of Computer Science, Faculty of Computer Science and, Warsaw University of Technology, ul. Zolnierska 49, Warsaw, 00-665, Poland
Henryk Rybiński
, Interdisciplinary Centre for Mathematica, University of Warsaw, Ul. Pawińskiego 5a, Warsaw, 02-106, Poland
Marek Niezgodka

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Betliński, P., Gora, P., Herba, K., Nguyen, T.T., Stawicki, S. (2012). Semantic Recognition of Digital Documents. In: Bembenik, R., Skonieczny, L., Rybiński, H., Niezgodka, M. (eds) Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol 390. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24809-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-24809-2_7
Published: 24 January 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24808-5
Online ISBN: 978-3-642-24809-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics