Skip to main content

Disclosing Citation Meanings for Augmented Research Retrieval and Exploration

  • Conference paper
  • First Online:
The Semantic Web (ESWC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11503))

Included in the following conference series:

Abstract

In recent years, new digital technologies are being used to support the navigation and the analysis of scientific publications, justified by the increasing number of articles published every year. For this reason, experts make use of on-line systems to browse thousands of articles in search of relevant information. In this paper, we present a new method that automatically assigns meanings to references on the basis of the citation text through a Natural Language Processing pipeline and a slightly-supervised clustering process. The resulting network of semantically-linked articles allows an informed exploration of the research panorama through semantic paths. The proposed approach has been validated using the ACL Anthology Dataset containing several thousands of papers related to the Computational Linguistics field. A manual evaluation on the extracted citation meanings carried to very high levels of accuracy. Finally, a freely-available web-based application has been developed and published on-line.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://acl-arc.comp.nus.edu.sg.

  2. 2.

    http://scikit-learn.org/stable/.

  3. 3.

    nsubj, csubj, nmod, advcl, dobj.

  4. 4.

    The Porter Stemmer has been adopted.

  5. 5.

    https://neo4j.com.

  6. 6.

    We excluded from the evaluation the 10th cluster related-to since it included the remaining citations having a very broad scope.

  7. 7.

    Since we did not have a complete labeled corpus with positive and negative examples, we could not compute standard Precision/Recall/F-measures.

  8. 8.

    Both documentation and source code of the pipeline, as well as the complete set of citation snippets per category and the graph, are available at https://github.com/rogerferrod/citexp.

References

  1. Akujuobi, U., Zhang, X.: Delve: a dataset-driven scholarly search and analysis system. SIGKDD Explor. Newsl. 19(2), 36–46 (2017). https://doi.org/10.1145/3166054.3166059. http://doi.acm.org/10.1145/3166054.3166059

    Article  Google Scholar 

  2. Alexander, E., Kohlmann, J., Valenza, R., Witmore, M., Gleicher, M.: Serendip: topic model-driven visual exploration of text corpora. In: 2014 IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 173–182. IEEE (2014)

    Google Scholar 

  3. Bergström, P., Atkinson, D.C.: Augmenting the exploration of digital libraries with web-based visualizations. In: 2009 Fourth International Conference on Digital Information Management, pp. 1–7, November 2009. https://doi.org/10.1109/ICDIM.2009.5356798

  4. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41(6), 391–407 (1990)

    Article  Google Scholar 

  5. Diederich, J., Balke, W.T., Thaden, U.: Demonstrating the semantic GrowBag: automatically creating topic facets for facetedDBLP. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2007, p. 505. ACM, New York (2007). https://doi.org/10.1145/1255175.1255305. http://doi.acm.org/10.1145/1255175.1255305

  6. Å ubelj, L., van Eck, N.J., Waltman, L.: Clustering scientific publications based on citation relations: a systematic comparison of different methods. PLoS ONE 11(4), e0154404 (2016)

    Article  Google Scholar 

  7. van Eck, N.J., Waltman, L.: VOS: a new method for visualizing similarities between objects. In: Decker, R., Lenz, H.-J. (eds.) Advances in Data Analysis. SCDAKO, pp. 299–306. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-70981-7_34

    Chapter  Google Scholar 

  8. van Eck, N.J., Waltman, L.: CitNetExplorer: a new software tool for analyzing and visualizing citation networks. J. Informetrics 8(4), 802–823 (2014)

    Article  Google Scholar 

  9. van Eck, N.J., Waltman, L.: Citation-based clustering of publications using CitNetExplorer and VOSviewer. Scientometrics 111(2), 1053–1070 (2017)

    Article  Google Scholar 

  10. Kan, M.-Y., Councill, I.G., Giles, C.L.: ParsCit: an open-source CRF reference string parsing package. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2008), Marrakesh, Morrocco, May 2008

    Google Scholar 

  11. Kataria, S., Mitra, P., Bhatia, S.: Utilizing context in generative Bayesian models for linked corpus. In: AAAI, vol. 10, p. 1 (2010)

    Google Scholar 

  12. Kim, J., Kim, D., Oh, A.: Joint modeling of topics, citations, and topical authority in academic corpora. arXiv preprint arXiv:1706.00593 (2017)

  13. Li, H., Councill, I.G., Lee, W.C., Giles, C.L.: CiteSeerx: an architecture and web service design for an academic document search engine. In: WWW (2006)

    Google Scholar 

  14. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations, pp. 55–60 (2014). http://www.aclweb.org/anthology/P/P14/P14-5010

  15. McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 169–178. ACM (2000)

    Google Scholar 

  16. Ratcliff, J.W., Metzener, D.E.: Pattern matching: the gestalt approach. Dr. Dobb’s J. 13(7), 46, 47, 59–51, 68–72 (July 1988)

    Google Scholar 

  17. Mutschke, P.: Mining networks and central entities in digital libraries. A graph theoretic approach applied to co-author networks. In: Berthold, M.R., Lenz, H.-J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810, pp. 155–166. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45231-7_15

    Chapter  Google Scholar 

  18. Nagwani, N.: Summarizing large text collection using topic modeling and clustering based on mapreduce framework. J. Big Data 2(1), 6 (2015)

    Article  Google Scholar 

  19. Newman, M.E.: Scientific collaboration networks. I. Network construction and fundamental results. Phys. Rev. E 64(1), 016131 (2001)

    Article  Google Scholar 

  20. Oelke, D., Strobelt, H., Rohrdantz, C., Gurevych, I., Deussen, O.: Comparative exploration of document collections: a visual analytics approach. In: Computer Graphics Forum, vol. 33, pp. 201–210. Wiley Online Library (2014)

    Google Scholar 

  21. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford InfoLab, November 1999. http://ilpubs.stanford.edu:8090/422/, previous number = SIDL-WP-1999-0120

  22. Popescul, A., Ungar, L.H., Flake, G.W., Lawrence, S., Giles, C.L.: Clustering and identifying temporal trends in document databases. In: ADL, p. 173. IEEE (2000)

    Google Scholar 

  23. Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., Steyvers, M.: Learning author-topic models from text corpora. ACM Trans. Inf. Syst. (TOIS) 28(1), 4 (2010)

    Article  Google Scholar 

  24. Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press (2004)

    Google Scholar 

  25. Shotton, S.P.D.: FaBiO and CiTO: ontologies for describing bibliographic resources and citations. Web Semant. Sci. Serv. Agents World Wide Web 17, 33–43 (2012)

    Article  Google Scholar 

  26. Strapparava, C., Mihalcea, R., Corley, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI 2006 Proceedings of the 21st National Conference on Artificial Intelligence, vol. 1, pp. 775–780 (2006)

    Google Scholar 

  27. Tu, Y., Johri, N., Roth, D., Hockenmaier, J.: Citation author topic model in expert search. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 1265–1273. Association for Computational Linguistics (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claudio Schifanella .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ferrod, R., Schifanella, C., Di Caro, L., Cataldi, M. (2019). Disclosing Citation Meanings for Augmented Research Retrieval and Exploration. In: Hitzler, P., et al. The Semantic Web. ESWC 2019. Lecture Notes in Computer Science(), vol 11503. Springer, Cham. https://doi.org/10.1007/978-3-030-21348-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-21348-0_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-21347-3

  • Online ISBN: 978-3-030-21348-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics