Abstract
Key variables recorded as text in colonoscopy and pathology reports have been extracted using natural language processing (NLP) tools that were not easily adaptable to new settings. We aimed to develop a reliable NLP tool with broad adaptability. During 1996–2016, Kaiser Permanente Northern California performed 401,566 colonoscopies with linked pathology. We randomly sampled 1000 linked reports into a Training Set and developed an NLP tool using SAS® PERL regular expressions. The NLP tool captured five colonoscopy and pathology variables: type, size, and location of polyps; extent of procedure; and quality of bowel preparation. We used a Validation Set (N = 3000) to confirm the variables’ classifications using manual chart review as the reference. Performance of the NLP tool was assessed using the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and Cohen’s κ. Cohen’s κ ranged from 93 to 99%. The sensitivity and specificity ranged from 95 to 100% across all categories. For categories with prevalence exceeding 10%, the PPV ranged from 97% to 100% except for adequate quality of preparation (prevalence 92%), for which the PPV was 65%. For categories with prevalence below 10%, the PPVs ranged from 62% to 100%. NPVs ranged from 94% to 100% except for the “complete” extent of procedure, for which the NPV was 73%. Using information from a large community-based population, we developed a transparent and adaptable NLP tool for extracting five colonoscopy and pathology variables. The tool can be readily tested in other healthcare settings.
Similar content being viewed by others
Abbreviations
- NLP:
-
natural language processing
- CRC:
-
colorectal cancer
- SP:
-
serrated polyp
- SSA:
-
sessile serrated adenoma
- SSP:
-
sessile serrated polyp
- HP:
-
hyperplastic polyp
- TSA:
-
traditional serrated adenoma
- NPV:
-
negative predictive value
- PPV:
-
positive predictive value
- CI:
-
confidence interval
References
Levin B, Lieberman DA, McFarland B, et al.. American Cancer Society Colorectal Cancer Advisory Group; US Multi-Society Task Force; American College of Radiology Colon Cancer Committee. Screening and surveillance for the early detection of colorectal cancer and adenomatous polyps, 2008: a joint guideline from the American Cancer Society, the US Multi-Society Task Force on Colorectal Cancer, and the American College of Radiology. CA Cancer J Clin 2008;58(3):130-60.
Rex DK, Boland CR, Dominitz JA, et al. Colorectal cancer screening: recommendations for physicians and patients from the U.S. Multi-society Task Force on colorectal cancer. Gastroenterology 2017;153:307e323.
Kaminski MF, Wieszczy P, Rupinski M, et al. Increased Rate of Adenoma Detection Associates With Reduced Risk of Colorectal Cancer and Death. Gastroenterology 2017;153(1):98-105.
Rex, D. K, Ahnen, D. J, Baron, J. A, et al. 2012. Serrated lesions of the colorectum: review and recommendations from an expert panel. Am J Gastroenterol, 107:1315-29; quiz 1314, 1330.
Erichsen R, Baron JA, Hamilton-Dutoit SJ, et al. Increased risk of colorectal cancer development among patients with serrated polyps. Gastroenterology 2016;150:895-902.
Anderson JC, Butterly LF, Weiss JE, et al. Providing data for serrated polyp detection rate benchmarks: an analysis of the New Hampshire Colonoscopy Registry. Gastrointest Endosc 2017;85:1188-94.
Liao KP, Cai T, Savova GK, et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ 2015;350:h1885.
Lee JK, Jensen CD, Lee A, et al. Development and validation of an algorithm for classifying colonoscopy indication. Gastrointest Endosc 2015;81:575-82.
Lee JK, Jensen CD, Levin TR, et al. Accurate identification of colonoscopy quality and polyp findings using natural language processing. J Clin Gastroenterol 2019;53(1):e25-e30.
Harkema H, Chapman WW, Saul M, Dellon ES, Schoen RE, Mehrotra A. Developing a natural language processing application for measuring the quality of colonoscopy procedures. J Am Med Inform Assoc 2011;18 Suppl 1:i150-6.
Carrell DS, Schoen RE, Leffler DA, et al. Challenges in adapting existing clinical natural language processing systems to multiple, diverse health care settings. J Am Med Inform Assoc 2017;24(5):986-991.
Gawron AJ, Thompson WK, Keswani RN, et al. Anatomic and advanced adenoma detection rates as quality metrics determined via natural language processing. Am J Gastroenterol 2014;109:1844-9.
Imler TD, Morea J, Kahi C, Imperiale TF. Natural language processing accurately categorizes findings from colonoscopy and pathology reports. Clin Gastroenterol Hepatol 2013;11(6):689-94.
Imler TD, Morea J, Kahi C, et al. Multi-center colonoscopy quality measurement utilizing natural language processing. Am J Gastroenterol. 2015;110:543-52.
Naylor J, Borges LF, Goryachev S, Gainer VS, Saltzman JR. Natural language processing accurately calculates adenoma and sessile serrated polyp detection rates. Dig Dis Sci 2018;63:1794-1800.
Raju GS, Lum PJ, Slack RS, et al. Natural language processing as an alternative to manual reporting of colonoscopy quality metrics. Gastrointest Endosc 2015;82(3):512-9.
Miller T, Dligach D, Bethard S, et al. Towards generalizable entity-centric clinical coreference resolution. J Biomed Inform 2017;69:251-258.
Li D, Woolfrey J, Jiang SF, et al. Diagnosis and predictors of sessile serrated adenoma after educational training in a large, community-based, integrated healthcare setting. Gastrointest Endosc 2018;87(3):755-765.
Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas 1960;20(1):37–46.
Corley DA, Jensen CD, Marks AR, et al. Adenoma detection rate and risk of colorectal cancer and death. N Engl J Med 2014;370:1298-306.
SAS® Perl regular expressions tip sheet. https://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf. Accessed January 21, 2019.
Liu L, Shorstein NH, Amsden LB, Herrinton LJ. Natural language processing to ascertain two key variables from operative reports in ophthalmology. Pharmacoepidemiol Drug Saf 2017;26(4):378-385.
Lieberman DA, Rex DK, Winawer SJ, Giardiello FM, Johnson DA, Levin TR et al.. Guidelines for colonoscopy surveillance after screening and polypectomy: a consensus update by the US Multi-Society Task Force on Colorectal Cancer. Gastroenterology 2012;143:844-57.
Lai EJ, Calderwood AH, Doros G, et al. The Boston Bowel Preparation Scale: A valid and reliable instrument for colonoscopy-oriented research. Gastrointest Endosc 2009;69:620-5.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Financial support
This research was supported by Kaiser Permanente Northern California Division of Research Physician Researcher Program Funding.
Conflict of interest
None.
Ethical approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Ethics
Institutional Review Board approval was obtained from the Kaiser Permanente Northern California Institutional Review Board.
Guarantor of the article
Drs. Dan Li and Lisa Herrinton take full responsibility for the conduct of the study, had access to the data, and had control of the decision to publish.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the Topical Collection on Education & Training
Electronic supplementary material
ESM 1
(DOCX 47 kb)
Rights and permissions
About this article
Cite this article
Fevrier, H.B., Liu, L., Herrinton, L.J. et al. A Transparent and Adaptable Method to Extract Colonoscopy and Pathology Data Using Natural Language Processing. J Med Syst 44, 151 (2020). https://doi.org/10.1007/s10916-020-01604-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-020-01604-8