Automatic classification of scanned electronic health record documents.
Clicks: 222
ID: 204868
2020
Article Quality & Performance Metrics
Overall Quality
Improving Quality
0.0
/100
Combines engagement data with AI-assessed academic quality
Reader Engagement
Emerging Content
6.9
/100
23 views
23 readers
Trending
AI Quality Assessment
Not analyzed
Abstract
Electronic Health Records (EHRs) contain scanned documents from a variety of sources such as identification cards, radiology reports, clinical correspondence, and many other document types. We describe the distribution of scanned documents at one health institution and describe the design and evaluation of a system to categorize documents into clinically relevant and non-clinically relevant categories as well as further sub-classifications. Our objective is to demonstrate that text classification systems can accurately classify scanned documents.We extracted text using Optical Character Recognition (OCR). We then created and evaluated multiple text classification machine learning models, including both "bag of words" and deep learning approaches. We evaluated the system on three different levels of classification using both the entire document as input, as well as the individual pages of the document. Finally, we compared the effects of different text processing methods.A deep learning model using ClinicalBERT performed best. This model distinguished between clinically-relevant documents and not clinically-relevant documents with an accuracy of 0.973; between intermediate sub-classifications with an accuracy of 0.949; and between individual classes with an accuracy of 0.913.Within the EHR, some document categories such as "external medical records" may contain hundreds of scanned pages without clear document boundaries. Without further sub-classification, clinicians must view every page or risk missing clinically-relevant information. Machine learning can automatically classify these scanned documents to reduce clinician burden.Using machine learning applied to OCR-extracted text has the potential to accurately identify clinically-relevant scanned content within EHRs.
| Reference Key |
goodrum2020automaticinternational
Use this key to autocite in the manuscript while using
SciMatic Manuscript Manager or Thesis Manager
|
|---|---|
| Authors | Goodrum, Heath;Roberts, Kirk;Bernstam, Elmer V; |
| Journal | International journal of medical informatics |
| Year | 2020 |
| DOI |
S1386-5056(20)30997-7
|
| URL | |
| Keywords |
Citations
No citations found. To add a citation, contact the admin at info@scimatic.org
Comments
No comments yet. Be the first to comment on this article.