Abstract
Given a biomedical article a, identification of those articles with similar core contents (including research goals, backgrounds, and conclusions) as a is essential for the survey and cross-validation of the highly related biomedical evidence presented in a. We thus present a technique CCSE (Core Content Similarity Estimation) that retrieves these highly related articles by estimating and integrating three kinds of inter-article similarity: goal similarity, background similarity, and conclusion similarity. CCSE works on titles and abstracts of biomedical articles, which are publicly available. Experimental results show that CCSE performs better than PubMed (a popular biomedical search engine) and typical techniques in identifying those scholarly articles that are judged (by biomedical experts) to be the ones whose core contents focus on the same gene-disease associations. The contribution is essential for the retrieval, clustering, mining, and validation of the biomedical evidence in literature.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Google Scholar is available at https://scholar.google.com.
- 2.
PubMed is available at http://www.ncbi.nlm.nih.gov/pubmed.
- 3.
DisGeNET is available at http://www.disgenet.org/web/DisGeNET/menu/home.
- 4.
GAD is available at http://geneticassociationdb.nih.gov.
- 5.
CTD is available at http://ctdbase.org.
- 6.
PMC is available at http://www.ncbi.nlm.nih.gov/pmc.
References
Aljaber, B., Stokes, N., Bailey, J., Pei, J.: Document clustering of scientific texts using citation contexts. Inf. Retrieval 13(2), 101–131 (2010)
Becker, K.G., Barnes, K.C., Bright, T.J., Wang, S.A.: The genetic association database. Nat. Genet. 36(5), 431–432 (2004)
Boyack, K.W., Newman, D., Duhon, R.J., Klavans, R., Patek, M., Biberstine, J.R., et al.: Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches. PLoS ONE 6(3), e18029 (2011)
Boyack, K.W., Klavans, R.: Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? J. Am. Soc. Inform. Sci. Technol. 61(12), 2389–2404 (2010)
Calado, P., Cristo, M., Moura, E., Ziviani, N., Ribeiro-Neto, B., Goncalves, M.A.: Combining link-based and content-based methods for web document classification. In: Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management, New Orleans, Louisiana, USA (2003)
Couto, T., Cristo, M., Gonçalves, M.A., Calado, P., Nivio Ziviani, N., Moura, E., Ribeiro-Neto, B.: A comparative study of citations and links in document classification. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 75–84 (2006)
Gipp, B., Beel, J.: Citation proximity analysis (CPA) – a new approach for identifying related work based on co-citation analysis. In: Proceedings of the 12th International Conference on Scientometrics and Informetrics, vol. 2, pp. 571–575 (2009)
Janssens, F., Glänzel, W., De Moor, B.: A hybrid mapping of information science. Scientometrics 75(3), 607–631 (2008)
Kessler, M.M.: Bibliographic coupling between scientific papers. Am. Doc. 14(1), 10–25 (1963)
Lin, J., Wilbur, W.J.: PubMed related articles: a probabilistic topic-based model for content similarity. BMC Bioinformatics 8, 423 (2007)
Liu, R.-L.: Citation-based extraction of core contents from biomedical articles. In: Proceedings of the 29th International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA/AIE 2016), pp. 217–228 (2016)
Liu, R.-L.: Passage-based bibliographic coupling: an inter-article similarity measure for biomedical articles. PLoS ONE 10(10), e0139245 (2015)
PubMed: Computation of Related Citations. http://www.ncbi.nlm.nih.gov/books/NBK3827/#pubmedhelp.Computation_of_Similar_Articl. Accessed: Nov 2014
Robertson, S.E., Walker, S., Beaulieu, M.: Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive. In: proceedings of the 7th Text REtrieval Conference (TREC 7), Gaithersburg, USA, pp. 253–264 (1998)
Small, H.G.: Co-citation in the scientific literature: a new measure of relationship between two documents. J. Am. Soc. Inform. Sci. Technol. 24(4), 265–269 (1973)
Wiegers, T.C., Davis, A.P., Cohen, K.B., Hirschman, L., Mattingly, C.J.: Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD). BMC Bioinf. 10, 326 (2009)
Acknowledgment
This research was supported by the Ministry of Science and Technology (grant ID: MOST 105-2221-E-320-004) and Tzu Chi University (grant IDs: TCRPP103020 and TCRPP104010), Taiwan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Liu, RL. (2017). Identification of Biomedical Articles with Highly Related Core Contents. In: Nguyen, N., Tojo, S., Nguyen, L., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2017. Lecture Notes in Computer Science(), vol 10191. Springer, Cham. https://doi.org/10.1007/978-3-319-54472-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-54472-4_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54471-7
Online ISBN: 978-3-319-54472-4
eBook Packages: Computer ScienceComputer Science (R0)