Faculty Publications

Natural language processing and machine learning to identify alcohol misuse from the electronic health record in trauma patients: development and internal validation

Majid Afshar, Stritch School of Medicine
Andrew Phillips, Loyola University Chicago
Niranjan Karnik, Rush University Medical Center
Jeanne Mueller, Loyola University Medical Center
Daniel To, Stritch School of Medicine
Richard Gonzalez, Loyola University Medical Center
Ron Price, Loyola University Chicago
Richard Cooper, Loyola University Chicago
Cara Joyce, Loyola University Chicago
Dmitriy Dligach, Loyola University Chicago

Document Type

Article

Publication Date

3-1-2019

Abstract

Objective: Alcohol misuse is present in over a quarter of trauma patients. Information in the clinical notes of the electronic health record of trauma patients may be used for phenotyping tasks with natural language processing (NLP) and supervised machine learning. The objective of this study is to train and validate an NLP classifier for identifying patients with alcohol misuse. Materials and Methods: An observational cohort of 1422 adult patients admitted to a trauma center between April 2013 and November 2016. Linguistic processing of clinical notes was performed using the clinical Text Analysis and Knowledge Extraction System. The primary analysis was the binary classification of alcohol misuse. The Alcohol Use Disorders Identification Test served as the reference standard. Results: The data corpus comprised 91 045 electronic health record notes and 16 091 features. In the final machine learning classifier, 16 features were selected from the first 24 hours of notes for identifying alcohol misuse. The classifier's performance in the validation cohort had an area under the receiver-operating characteristic curve of 0.78 (95% confidence interval [CI], 0.72 to 0.85). Sensitivity and specificity were at 56.0% (95% CI, 44.1% to 68.0%) and 88.9% (95% CI, 84.4% to 92.8%). The Hosmer-Lemeshow goodness-of-fit test demonstrates the classifier fits the data well (P.17). A simpler rule-based keyword approach had a decrease in sensitivity when compared with the NLP classifier from 56.0% to 18.2%. Conclusions: The NLP classifier has adequate predictive validity for identifying alcohol misuse in trauma centers. External validation is needed before its application to augment screening.

Publication Source (Journal or Book title)

Journal of the American Medical Informatics Association

First Page

254

Last Page

261

Recommended Citation

Afshar, M., Phillips, A., Karnik, N., Mueller, J., To, D., Gonzalez, R., Price, R., Cooper, R., Joyce, C., & Dligach, D. (2019). Natural language processing and machine learning to identify alcohol misuse from the electronic health record in trauma patients: development and internal validation. Journal of the American Medical Informatics Association, 26 (3), 254-261. https://doi.org/10.1093/jamia/ocy166

Download

COinS

Faculty Publications

Natural language processing and machine learning to identify alcohol misuse from the electronic health record in trauma patients: development and internal validation

Document Type

Publication Date

Abstract

Publication Source (Journal or Book title)

First Page

Last Page

Recommended Citation

Search

Browse

Author Corner

SPONSORED BY

Faculty Publications

Natural language processing and machine learning to identify alcohol misuse from the electronic health record in trauma patients: development and internal validation

Authors

Document Type

Publication Date

Abstract

Publication Source (Journal or Book title)

First Page

Last Page

Recommended Citation

Share

Search

Browse

Author Corner

SPONSORED BY