Two Applications of Statistical Modelling to Natural Language Processing

DuMouchel, William; Friedman, Carol; Hripcsak, George; Johnson, Stephen B.; Clayton, Paul D.

doi:10.1007/978-1-4612-2404-4_39

William DuMouchel³,
Carol Friedman⁴,
George Hripcsak³,
Stephen B. Johnson³ &
…
Paul D. Clayton³

Part of the book series: Lecture Notes in Statistics ((LNS,volume 112))

863 Accesses
3 Citations

Abstract

Each week the Columbia-Presbyterian Medical Center collects several megabytes of English text transcribed from radiologists’ dictation and notes of their interpretations of medical diagnostic x-rays. It is desired to automate the extraction of diagnoses from these natural language reports. This paper reports on two aspects of this project requiring advanced statistical methods. First, the identification of pairs of words and phrases that tend to appear together (collocate) uses a hierarchical Bayesian model that adjusts to different word and word pair distributions in different bodies of text. Second, we present an analysis of data from experiments to compare the performance of the computer diagnostic program to that of a panel of physician and lay readers of randomly sampled texts. A measure of inter-subject distance with respect to the diagnoses is defined for which estimated variances and covariances are easily computed. This allows statistical conclusions about the similarities and dissimilarities among diagnoses by the various programs and experts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dillon W, Goldstein M (1984) Multivariate Analysis, New York: Wiley, 587pp.
MATH Google Scholar
Dunning, Ted (1993) Accurate methods for the statistics of surprise and coincidence, Computational Linguistics, 19: 61–74.
Google Scholar
Friedman C, Hripcsak G, DuMouchel W, Johnson S, Clayton P (1995) Natural language processing in an operational clinical information system, Natural Language Engineering 1 (1): 1–28.
Article Google Scholar
Hripcsak G, Friedman C, Alderson P, DuMouchel W, Johnson S, Clayton P (1995) Unlocking clinical data from narrative reports: a study of natural language processing. Annals of Internal Medicine, 122: 681–688.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Medical Informatics, Columbia University, 161 Fort Washington Avenue, New York, NY, 10032, USA
William DuMouchel, George Hripcsak, Stephen B. Johnson & Paul D. Clayton
Department of Computer Science, Queens College, CUNY, Flushing, NY, 11367, USA
Carol Friedman

Authors

William DuMouchel
View author publications
You can also search for this author in PubMed Google Scholar
Carol Friedman
View author publications
You can also search for this author in PubMed Google Scholar
George Hripcsak
View author publications
You can also search for this author in PubMed Google Scholar
Stephen B. Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Paul D. Clayton
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Vanderbilt University, Box 1679, Station B, Nashville, Tennessee, 37235, USA
Doug Fisher
Department of Economics Institute of Statistics and Econometrics, Free University of Berlin, 14185, Berlin, Garystre 21, Germany
Hans-J. Lenz

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

DuMouchel, W., Friedman, C., Hripcsak, G., Johnson, S.B., Clayton, P.D. (1996). Two Applications of Statistical Modelling to Natural Language Processing. In: Fisher, D., Lenz, HJ. (eds) Learning from Data. Lecture Notes in Statistics, vol 112. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2404-4_39

Download citation

DOI: https://doi.org/10.1007/978-1-4612-2404-4_39
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-94736-5
Online ISBN: 978-1-4612-2404-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics