Disclosing Citation Meanings for Augmented Research Retrieval and Exploration

Ferrod, Roger; Schifanella, Claudio; Di Caro, Luigi; Cataldi, Mario

doi:10.1007/978-3-030-21348-0_7

Roger Ferrod¹⁶,
Claudio Schifanella¹⁶,
Luigi Di Caro¹⁶ &
…
Mario Cataldi¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11503))

Included in the following conference series:

European Semantic Web Conference

2568 Accesses
3 Citations

Abstract

In recent years, new digital technologies are being used to support the navigation and the analysis of scientific publications, justified by the increasing number of articles published every year. For this reason, experts make use of on-line systems to browse thousands of articles in search of relevant information. In this paper, we present a new method that automatically assigns meanings to references on the basis of the citation text through a Natural Language Processing pipeline and a slightly-supervised clustering process. The resulting network of semantically-linked articles allows an informed exploration of the research panorama through semantic paths. The proposed approach has been validated using the ACL Anthology Dataset containing several thousands of papers related to the Computational Linguistics field. A manual evaluation on the extracted citation meanings carried to very high levels of accuracy. Finally, a freely-available web-based application has been developed and published on-line.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://acl-arc.comp.nus.edu.sg.
2.
http://scikit-learn.org/stable/.
3.
nsubj, csubj, nmod, advcl, dobj.
4.
The Porter Stemmer has been adopted.
5.
https://neo4j.com.
6.
We excluded from the evaluation the 10th cluster related-to since it included the remaining citations having a very broad scope.
7.
Since we did not have a complete labeled corpus with positive and negative examples, we could not compute standard Precision/Recall/F-measures.
8.
Both documentation and source code of the pipeline, as well as the complete set of citation snippets per category and the graph, are available at https://github.com/rogerferrod/citexp.

References

Akujuobi, U., Zhang, X.: Delve: a dataset-driven scholarly search and analysis system. SIGKDD Explor. Newsl. 19(2), 36–46 (2017). https://doi.org/10.1145/3166054.3166059. http://doi.acm.org/10.1145/3166054.3166059
Article Google Scholar
Alexander, E., Kohlmann, J., Valenza, R., Witmore, M., Gleicher, M.: Serendip: topic model-driven visual exploration of text corpora. In: 2014 IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 173–182. IEEE (2014)
Google Scholar
Bergström, P., Atkinson, D.C.: Augmenting the exploration of digital libraries with web-based visualizations. In: 2009 Fourth International Conference on Digital Information Management, pp. 1–7, November 2009. https://doi.org/10.1109/ICDIM.2009.5356798
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41(6), 391–407 (1990)
Article Google Scholar
Diederich, J., Balke, W.T., Thaden, U.: Demonstrating the semantic GrowBag: automatically creating topic facets for facetedDBLP. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2007, p. 505. ACM, New York (2007). https://doi.org/10.1145/1255175.1255305. http://doi.acm.org/10.1145/1255175.1255305
Šubelj, L., van Eck, N.J., Waltman, L.: Clustering scientific publications based on citation relations: a systematic comparison of different methods. PLoS ONE 11(4), e0154404 (2016)
Article Google Scholar
van Eck, N.J., Waltman, L.: VOS: a new method for visualizing similarities between objects. In: Decker, R., Lenz, H.-J. (eds.) Advances in Data Analysis. SCDAKO, pp. 299–306. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-70981-7_34
Chapter Google Scholar
van Eck, N.J., Waltman, L.: CitNetExplorer: a new software tool for analyzing and visualizing citation networks. J. Informetrics 8(4), 802–823 (2014)
Article Google Scholar
van Eck, N.J., Waltman, L.: Citation-based clustering of publications using CitNetExplorer and VOSviewer. Scientometrics 111(2), 1053–1070 (2017)
Article Google Scholar
Kan, M.-Y., Councill, I.G., Giles, C.L.: ParsCit: an open-source CRF reference string parsing package. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2008), Marrakesh, Morrocco, May 2008
Google Scholar
Kataria, S., Mitra, P., Bhatia, S.: Utilizing context in generative Bayesian models for linked corpus. In: AAAI, vol. 10, p. 1 (2010)
Google Scholar
Kim, J., Kim, D., Oh, A.: Joint modeling of topics, citations, and topical authority in academic corpora. arXiv preprint arXiv:1706.00593 (2017)
Li, H., Councill, I.G., Lee, W.C., Giles, C.L.: CiteSeerx: an architecture and web service design for an academic document search engine. In: WWW (2006)
Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations, pp. 55–60 (2014). http://www.aclweb.org/anthology/P/P14/P14-5010
McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 169–178. ACM (2000)
Google Scholar
Ratcliff, J.W., Metzener, D.E.: Pattern matching: the gestalt approach. Dr. Dobb’s J. 13(7), 46, 47, 59–51, 68–72 (July 1988)
Google Scholar
Mutschke, P.: Mining networks and central entities in digital libraries. A graph theoretic approach applied to co-author networks. In: Berthold, M.R., Lenz, H.-J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810, pp. 155–166. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45231-7_15
Chapter Google Scholar
Nagwani, N.: Summarizing large text collection using topic modeling and clustering based on mapreduce framework. J. Big Data 2(1), 6 (2015)
Article Google Scholar
Newman, M.E.: Scientific collaboration networks. I. Network construction and fundamental results. Phys. Rev. E 64(1), 016131 (2001)
Article Google Scholar
Oelke, D., Strobelt, H., Rohrdantz, C., Gurevych, I., Deussen, O.: Comparative exploration of document collections: a visual analytics approach. In: Computer Graphics Forum, vol. 33, pp. 201–210. Wiley Online Library (2014)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford InfoLab, November 1999. http://ilpubs.stanford.edu:8090/422/, previous number = SIDL-WP-1999-0120
Popescul, A., Ungar, L.H., Flake, G.W., Lawrence, S., Giles, C.L.: Clustering and identifying temporal trends in document databases. In: ADL, p. 173. IEEE (2000)
Google Scholar
Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., Steyvers, M.: Learning author-topic models from text corpora. ACM Trans. Inf. Syst. (TOIS) 28(1), 4 (2010)
Article Google Scholar
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press (2004)
Google Scholar
Shotton, S.P.D.: FaBiO and CiTO: ontologies for describing bibliographic resources and citations. Web Semant. Sci. Serv. Agents World Wide Web 17, 33–43 (2012)
Article Google Scholar
Strapparava, C., Mihalcea, R., Corley, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI 2006 Proceedings of the 21st National Conference on Artificial Intelligence, vol. 1, pp. 775–780 (2006)
Google Scholar
Tu, Y., Johri, N., Roth, D., Hockenmaier, J.: Citation author topic model in expert search. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 1265–1273. Association for Computational Linguistics (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Turin, Turin, Italy
Roger Ferrod, Claudio Schifanella & Luigi Di Caro
Department of Computer Science, University of Paris 8, Saint-Denis, France
Mario Cataldi

Authors

Roger Ferrod
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Schifanella
View author publications
You can also search for this author in PubMed Google Scholar
Luigi Di Caro
View author publications
You can also search for this author in PubMed Google Scholar
Mario Cataldi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Claudio Schifanella .

Editor information

Editors and Affiliations

Wright State University, Dayton, OH, USA
Pascal Hitzler
KMi, The Open University, Milton Keynes, UK
Miriam Fernández
University of California, Santa Barbara, CA, USA
Krzysztof Janowicz
Maastricht University, Maastricht, The Netherlands
Amrapali Zaveri
Heriot-Watt University, Edinburgh, UK
Alasdair J.G. Gray
IBM Research, Dublin, Ireland
Vanessa Lopez
The Australian National University, Canberra, ACT, Australia
Armin Haller
Jönköping University, Jönköping, Sweden
Karl Hammar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferrod, R., Schifanella, C., Di Caro, L., Cataldi, M. (2019). Disclosing Citation Meanings for Augmented Research Retrieval and Exploration. In: Hitzler, P., et al. The Semantic Web. ESWC 2019. Lecture Notes in Computer Science(), vol 11503. Springer, Cham. https://doi.org/10.1007/978-3-030-21348-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-21348-0_7
Published: 25 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21347-3
Online ISBN: 978-3-030-21348-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics