Automatic classification of scanned electronic health record documents.

Goodrum; Heath;Roberts; Kirk;Bernstam; Elmer V;

doi:S1386-5056(20)30997-7

Automatic classification of scanned electronic health record documents.

Clicks: 267

ID: 204868

2020

Article Quality & Performance Metrics

Overall Quality Improving Quality

0.0 /100

Combines engagement data with AI-assessed academic quality

Reader Engagement Emerging Content

30.0 /100

249 views

41 readers

AI Quality Assessment

Not analyzed

Abstract

EN
- Turkish
- Spanish
- Portuguese
- Arabic
- Chinese
- French
- German
- Indonesian
- Russian
- Thai

Electronic Health Records (EHRs) contain scanned documents from a variety of sources such as identification cards, radiology reports, clinical correspondence, and many other document types. We describe the distribution of scanned documents at one health institution and describe the design and evaluation of a system to categorize documents into clinically relevant and non-clinically relevant categories as well as further sub-classifications. Our objective is to demonstrate that text classification systems can accurately classify scanned documents.We extracted text using Optical Character Recognition (OCR). We then created and evaluated multiple text classification machine learning models, including both "bag of words" and deep learning approaches. We evaluated the system on three different levels of classification using both the entire document as input, as well as the individual pages of the document. Finally, we compared the effects of different text processing methods.A deep learning model using ClinicalBERT performed best. This model distinguished between clinically-relevant documents and not clinically-relevant documents with an accuracy of 0.973; between intermediate sub-classifications with an accuracy of 0.949; and between individual classes with an accuracy of 0.913.Within the EHR, some document categories such as "external medical records" may contain hundreds of scanned pages without clear document boundaries. Without further sub-classification, clinicians must view every page or risk missing clinically-relevant information. Machine learning can automatically classify these scanned documents to reduce clinician burden.Using machine learning applied to OCR-extracted text has the potential to accurately identify clinically-relevant scanned content within EHRs.

Reference Key	goodrum2020automaticinternational Use this key to autocite in the manuscript while using SciMatic Manuscript Manager or Thesis Manager
Authors	Goodrum, Heath;Roberts, Kirk;Bernstam, Elmer V;
Journal	International journal of medical informatics
Year	2020
DOI	S1386-5056(20)30997-7 Searching for DOI...
URL	https://doi.org/S1386-5056(20)30997-7
Keywords	electronic health records Machine learning Patient safety classification optical character recognition scanned documents

Citations

No citations found. To add a citation, contact the admin at info@scimatic.org

Comments

Login to comment Register

No comments yet. Be the first to comment on this article.