Abstract
Each week the Columbia-Presbyterian Medical Center collects several megabytes of English text transcribed from radiologists’ dictation and notes of their interpretations of medical diagnostic x-rays. It is desired to automate the extraction of diagnoses from these natural language reports. This paper reports on two aspects of this project requiring advanced statistical methods. First, the identification of pairs of words and phrases that tend to appear together (collocate) uses a hierarchical Bayesian model that adjusts to different word and word pair distributions in different bodies of text. Second, we present an analysis of data from experiments to compare the performance of the computer diagnostic program to that of a panel of physician and lay readers of randomly sampled texts. A measure of inter-subject distance with respect to the diagnoses is defined for which estimated variances and covariances are easily computed. This allows statistical conclusions about the similarities and dissimilarities among diagnoses by the various programs and experts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dillon W, Goldstein M (1984) Multivariate Analysis, New York: Wiley, 587pp.
Dunning, Ted (1993) Accurate methods for the statistics of surprise and coincidence, Computational Linguistics, 19: 61–74.
Friedman C, Hripcsak G, DuMouchel W, Johnson S, Clayton P (1995) Natural language processing in an operational clinical information system, Natural Language Engineering 1 (1): 1–28.
Hripcsak G, Friedman C, Alderson P, DuMouchel W, Johnson S, Clayton P (1995) Unlocking clinical data from narrative reports: a study of natural language processing. Annals of Internal Medicine, 122: 681–688.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Springer-Verlag New York, Inc.
About this chapter
Cite this chapter
DuMouchel, W., Friedman, C., Hripcsak, G., Johnson, S.B., Clayton, P.D. (1996). Two Applications of Statistical Modelling to Natural Language Processing. In: Fisher, D., Lenz, HJ. (eds) Learning from Data. Lecture Notes in Statistics, vol 112. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2404-4_39
Download citation
DOI: https://doi.org/10.1007/978-1-4612-2404-4_39
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-94736-5
Online ISBN: 978-1-4612-2404-4
eBook Packages: Springer Book Archive