PMSC-UGR: A Test Collection for Expert Recommendation Based on PubMed and Scopus

Albusac, César; de Campos, Luis M.; Fernández-Luna, Juan M.; Huete, Juan F.

doi:10.1007/978-3-030-00374-6_4

César Albusac²⁰,
Luis M. de Campos²⁰,
Juan M. Fernández-Luna²⁰ &
…
Juan F. Huete²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11160))

Included in the following conference series:

Conference of the Spanish Association for Artificial Intelligence

875 Accesses
2 Citations

Abstract

A new test document collection, PMSC-UGR, is presented in this paper. It has been built using a large subset of MEDLINE/PubMed scientific articles, which have been subjected to a disambiguation process to identify unequivocally who are their authors (using ORCID). The collection has also been completed by adding citations to these articles available through Scopus/Elsevier’s API. Although this test collection can be used for different purposes, we focus here on its use for expert recommendation and document filtering, reporting some preliminary experiments and their results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Profile extracted, for example, from documents authored by this individual.
2.
Although PubMed does not contain complete articles but references to articles, called citations, we will use the term articles to refer to these citations, and reserve the name citations to refer to other articles that cite in their bibliographic references a given article.
3.
ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline/.
4.
https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html.
5.
In fact 26,661,157 articles in PubMed have not any ORCID.
6.
The data was downloaded from Scopus API between July 3 and September 27, 2017 via http://api.elsevier.com and http://www.scopus.com.
7.
The reason is that for these authors we cannot obtain citations to their articles, so this secondary collection is larger but contains less information.
8.
https://lucene.apache.org.
9.
We do not require a perfect match, allowing an edit distance of 5 for title and 3 for author.
10.
This may happen, for example, when the articles (probably only one) in PubMed of an author (having ORCID and ScopusID) do not appear in the list of papers in Scopus written by this author.
11.
http://trec.nist.gov/trec_eval/.

References

Bailey, P., Craswell, N., Soboroff, I., de Vries, A.P.: The CSIRO enterprise search collection. In: SIGIR Forum, vol. 41, pp. 42–45 (2007)
Google Scholar
Balog, K., Fang, Y., de Rijke, M., Serdyukov, P., Si, L.: Expertise retrieval. Found. Trends Inf. Retrieval 6, 127–256 (2012)
Article Google Scholar
Beel, J., Gipp, B., Langer, S., Breitinger, C.: Research-paper recommender systems: a literature survey. Int. J. Digit. Libr. 17, 305–338 (2016)
Article Google Scholar
Bobadilla, J., Hernando, A., Fernando, O., Gutiérrez, A.: Recommender systems survey. Knowl.-Based Syst. 46, 109–132 (2013)
Article Google Scholar
Carterette, B., Pavlu, V., Kanoulas, E., Aslam, J., Allan, J.: Evaluation over thousands of queries. In: Proceedings of the 31st ACM SIGIR Conference, pp. 651–658 (2008)
Google Scholar
Carterette, B., Smucker, M.: Hypothesis testing with incomplete relevance judgments. In: Proceedings of the 16th ACM CIKM Conference, pp. 643–652 (2007)
Google Scholar
Craswell, N., de Vries, A.P., Soboroff, I.: Overview of the TREC 2005 enterprise track. In: Proceedings of the 14th TREC Conference (2005)
Google Scholar
de Campos, L.M., Fernández-Luna, J.M., Huete, J.F., Redondo-Expósito, L.: Comparing machine learning and information retrieval-based approaches for filtering documents in a parliamentary setting. In: Moral, S., Pivert, O., Sánchez, D., Marín, N. (eds.) SUM 2017. LNCS (LNAI), vol. 10564, pp. 64–77. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67582-4_5
Chapter Google Scholar
Hanani, U., Shapira, B., Shoval, P.: Information filtering: overview of issues, research and systems. User Model. User-Adap. Inter. 11, 203–259 (2001)
Article Google Scholar
Mangaravite, V., Santos, R.L.T., Ribeiro, I.S., Gonçalves, M.A., Laender, A.H.F.: The LExR collection for expertise retrieval in academia. In: Proceedings of the 39th ACM SIGIR Conference, pp. 721–724 (2016)
Google Scholar
Sanderson, M., Zobel, J.: Information retrieval system evaluation: effort, sensitivity, and reliability. In: Proceedings of the 28th ACM SIGIR Conference, pp. 162–169 (2005)
Google Scholar
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD Conference, pp. 990–998 (2008)
Google Scholar
Wang, D., Liang, Y., Xu, D., Feng, X., Guan, R.: A content-based recommender system for computer science publications. Knowl.-Based Syst. 157, 1–9 (2018)
Article Google Scholar

Download references

Acknowledgment

This work has been funded by the Spanish “Ministerio de Economía y Competitividad” under project TIN2016-77902-C3-2-P, and the European Regional Development Fund (ERDF-FEDER).

Author information

Authors and Affiliations

Departamento de Ciencias de la Computación e Inteligencia Artificial, ETSI Informática y de Telecomunicación, CITIC-UGR, Universidad de Granada, 18071, Granada, Spain
César Albusac, Luis M. de Campos, Juan M. Fernández-Luna & Juan F. Huete

Authors

César Albusac
View author publications
You can also search for this author in PubMed Google Scholar
Luis M. de Campos
View author publications
You can also search for this author in PubMed Google Scholar
Juan M. Fernández-Luna
View author publications
You can also search for this author in PubMed Google Scholar
Juan F. Huete
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan M. Fernández-Luna .

Editor information

Editors and Affiliations

Andalusian Research Institute on Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, Spain
Francisco Herrera
Andalusian Research Institute on Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, Spain
Sergio Damas
Andalusian Research Institute on Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, Spain
Rosana Montes
Andalusian Research Institute on Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, Spain
Sergio Alonso
Andalusian Research Institute on Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, Spain
Óscar Cordón
Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
Antonio González
School of Engineering, Pablo de Olavide University, Seville, Spain
Alicia Troncoso

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Albusac, C., de Campos, L.M., Fernández-Luna, J.M., Huete, J.F. (2018). PMSC-UGR: A Test Collection for Expert Recommendation Based on PubMed and Scopus. In: Herrera, F., et al. Advances in Artificial Intelligence. CAEPIA 2018. Lecture Notes in Computer Science(), vol 11160. Springer, Cham. https://doi.org/10.1007/978-3-030-00374-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-00374-6_4
Published: 27 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00373-9
Online ISBN: 978-3-030-00374-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics