Skip to main content

Latent Argumentative Pruning for Compact MEDLINE Indexing

  • Conference paper
Artificial Intelligence in Medicine (AIME 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3581))

Included in the following conference series:

Abstract

PURPOSE: We evaluate how argumentation in scientific articles can be used to propose an original index pruning strategy, which significantly reduce the size of the engine’s indexes but having a limited impact on retrieval effectiveness. METHODS: A Bayesian classifier trained on explicitly structured MEDLINE abstracts generates these argumentative categories. The categories are used to generate four different argumentative indexes. A fifth index contains the complete abstract, together with the title and the list of Medical Subject Headings (MeSH) terms. This last index is used as baseline to compare results obtained when only a specific argumentative index is retrieved. RESULTS and CONCLUSION: When titles and medical subject headings are also stored in the respective indexes, querying PURPOSE and CONCLUSION indexes can respectively achieves 78.4% and 74.3% of the baseline, while the size if the index is divided by two. It is concluded that argumentation can be a powerful index pruning strategy in complement to more traditionnal approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aronson, A., Mork, J., Gay, C., Humphrey, S., Rogers, W.: The NLM Indexing Initiative’s Medical Text Indexer. In: MedInfo 1989 Proceedings (2004)

    Google Scholar 

  2. Ruch, P., Baud, R.: valuating and Reducing the Effect of Data Corruption when Applying Bag of Words Approaches to Medical Records. Int J Med Inf 67(1-3), 75–83 (2002)

    Article  Google Scholar 

  3. Névéol, A., Soualmia, L., Douyère, M., Rogozan, A., Thirion, B., Darmoni, S.: Using cismef mesh ”encapsulated” terminology and a categorization algorithm for health resources. Int J Med Inf 73(1), 57–64 (2004)

    Article  Google Scholar 

  4. Tschopp, M., Lovis, C., Geissbühler, A.: Understanding usage patterns of handheld computers in clinical practice. In: Proc. AMIA Symp, pp. 806–9 (2000)

    Google Scholar 

  5. Witten, I., Moffat, A., Bell, T.: Managing Gigabytes. Morgan Kaufman, San Francisco (1999)

    Google Scholar 

  6. Carmel, D., Cohen, D., Fagin, R., Farchi, E., Herscovici, M., Maarek, Y., Soffer, A.: Static index pruning for information retrieval systems. In: Proc. of ACM-SIGIR, pp. 43–50 (2001)

    Google Scholar 

  7. Craswell, N., Hawking, D., Wilkinson, R., Wu, M.: Overview of the trec 2003 web track. In: TREC, pp. 78–92 (2003)

    Google Scholar 

  8. Aronson, A., Bodenreider, O., Chang, H., Humphrey, S., Mork, J., Nelson, S., Rindflesch, T., Wilbur, W.: The indexing initiative. A report to the board of scientific counselors of the lister hill national center for biomedical communications. Technical report, NLM (1999)

    Google Scholar 

  9. Schuemie, M., Weeber, M., Schijvenaars, B., van Mulligen, E., van der Eijk, C., Jeliert, R., Mons, B., Kors, J.: Distribution of information in biomedical abstracts and full text publications. Bioinformatics (2004)

    Google Scholar 

  10. Orasan, C.: Patterns in Scientific Abstracts. In: Proceedings of Corpus Linguistics, pp. 433–445

    Google Scholar 

  11. Ruch, P., Chichester, C., Cohen, G., Coray, G., Ehrler, F., Ghorbel, H., Müller, H., Pallotta, V.: Report on the TREC 2003 Experiment: Genomic Track. In: TREC-12 (2004)

    Google Scholar 

  12. Shaw, W., Wood, J., Wood, R., Tibbo, H.: The cystic fibrosis database: Content and research opportunities. LSIR 13, 347–366 (1991)

    Google Scholar 

  13. Salton, G., Fox, E., Wu, H.: Communications of the acm. Journal of the American Society for Information Science 26(11), 1022–1036 (1983)

    MATH  MathSciNet  Google Scholar 

  14. Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: ACM-SIGIR, pp. 21–29 (1996)

    Google Scholar 

  15. Amati, G., van Rijsbergen, C.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems (TOIS) 20(4), 357–389 (2002)

    Article  Google Scholar 

  16. Savoy, J.: Report on clef-2003 monolingual tracks: Fusion of probabilistic models for effective monolingual retrieval. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 322–336. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  17. Ruch, P.: Using contextual spelling correction to improve retrieval effectiveness in degraded text collections. In: COLING 2002 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ruch, P., Baud, R., Marty, J., Geissbühler, A., Tbahriti, I., Veuthey, AL. (2005). Latent Argumentative Pruning for Compact MEDLINE Indexing. In: Miksch, S., Hunter, J., Keravnou, E.T. (eds) Artificial Intelligence in Medicine. AIME 2005. Lecture Notes in Computer Science(), vol 3581. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527770_36

Download citation

  • DOI: https://doi.org/10.1007/11527770_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27831-3

  • Online ISBN: 978-3-540-31884-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics