Skip to main content

Automatic Indexing of Journal Abstracts with Latent Semantic Analysis

  • Conference paper
  • First Online:
Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9283))

Abstract

The BioASQ “Task on Large-Scale Online Biomedical Semantic Indexing” charges participants with assigning semantic tags to biomedical journal abstracts. We present a system that takes as input a biomedical abstract and uses latent semantic analysis to identify similar documents in the MEDLINE database. The system then uses a novel ranking scheme to select a list of MeSH tags from candidates drawn from the most similar documents. Our approach achieved better than baseline performance in both precision and recall. We suggest several possible strategies to improve the system’s performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aronson, A.R., Bodenreider, O., Chang, H.F., Humphrey, S.M., Mork, J.G., Nelson, S.J., Rindflesch, T.C., Wilbur, W.J.: The NLM indexing initiative. In: AMIA Annual Symposium Proceedings, pp. 17–21 (2000)

    Google Scholar 

  2. Aronson, A.R., Lang, F.M.: An overview of MetaMap: historical perspective and recent advances. Journal of the American Medical Informatics Association : JAMIA 17(3), 229–236 (2010)

    Article  Google Scholar 

  3. BioASQ: Test results for task 3a (2015). http://participants-area.bioasq.org/results/3a/

  4. Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th International Conference on Machine Learning, pp. 129–136. ACM (2007)

    Google Scholar 

  5. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)

    Article  Google Scholar 

  6. Furnas, G., Deerwester, S., Dumais, S., Landauer, T.K., Harshman, R., Streeter, L., Lochbaum, K.: Information retrieval using a singular value decomposition model of latent semantic structure. In: Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1988, May 1988

    Google Scholar 

  7. Huang, M., Névéol, A., Lu, Z.: Recommending mesh terms for annotating biomedical articles. Journal of the American Medical Informatics Association 18(5), 660–667 (2011)

    Article  Google Scholar 

  8. Jimeno Yepes, A., Mork, J.G., Wilkowski, B., Demner-Fushman, D., Aronson, A.R.: MEDLINE MeSH indexing: lessons learned from machine learning and future directions. In: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, pp. 737–742. ACM, New York (2012)

    Google Scholar 

  9. Kiss, T., Strunk, J.: Unsupervised Multilingual Sentence Boundary Detection. Computational Linguistics 32(4), 485–525 (2006)

    Article  Google Scholar 

  10. Lin, J., DiCuccio, M., Grigoryan, V., Wilbur, W.: Navigating information spaces: A case study of related article search in PubMed. Information Processing and Management 44(5), 1771–1783 (2008)

    Article  Google Scholar 

  11. Littman, M.L., Dumais, S.T., Landauer, T.K.: Automatic cross-language information retrieval using latent semantic indexing. In: Grefenstette, G. (ed.) Cross-Language Information Retrieval: The Spring International Series on Information Retrieval, pp. 51–62. Springer (1998)

    Google Scholar 

  12. National Library of Medicine: The medline indexing process: Determining subject content (2015). http://www.nlm.nih.gov/bsd/disted/meshtutorial/principlesofmedlinesubjectindexing/theindexingprocess/

  13. Partalas, I., Gaussier, É., Ngomo, A.C.N.: Results of the first bioasq workshop. In: BioASQ@ CLEF, pp. 1–8 (2013)

    Google Scholar 

  14. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), November 1975

    Google Scholar 

  15. Stevenson, M., Guo, Y., Al Amri, A., Gaizauskas, R.: Disambiguation of biomedical abbreviations. In: Proc. Workshop Current Trends in Biomedical Natural Language Processing, pp. 71–79 (2009)

    Google Scholar 

  16. Tsatsaronis, G., Balikas, G., Malakasiotis, P., Partalas, I., Zschunke, M., Alvers, M.R., Weissenborn, D., Krithara, A., Petridis, S., Polychronopoulos, D., et al.: An overview of the bioasq large-scale biomedical semantic indexing and question answering competition. BMC bioinformatics 16(1), 138 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joel Robert Adams .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Adams, J.R., Bedrick, S. (2015). Automatic Indexing of Journal Abstracts with Latent Semantic Analysis. In: Mothe, J., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2015. Lecture Notes in Computer Science(), vol 9283. Springer, Cham. https://doi.org/10.1007/978-3-319-24027-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24027-5_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24026-8

  • Online ISBN: 978-3-319-24027-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics