Comparing Diagnostic Accuracy of ChatGPT to Clinical Diagnosis in General Surgery Consults: A Quantitative Analysis of Disease Diagnosis.
Clicks: 8
ID: 283192
2025
Article Quality & Performance Metrics
Overall Quality
Improving Quality
0.0
/100
Combines engagement data with AI-assessed academic quality
Reader Engagement
Emerging Content
0.3
/100
1 views
1 readers
Trending
AI Quality Assessment
Not analyzed
Abstract
This study addressed the challenge of providing accurate and timely medical diagnostics in military health care settings with limited access to advanced diagnostic tools, such as those encountered in austere environments, remote locations, or during large-scale combat operations. The primary objective was to evaluate the utility of ChatGPT, an artificial intelligence (AI) language model, as a support tool for health care providers in clinical decision-making and early diagnosis. The research used an observational cross-sectional cohort design and exploratory predictive techniques. The methodology involved collecting and analyzing data from clinical scenarios based on common general surgery diagnoses-acute appendicitis, acute cholecystitis, and diverticulitis. These scenarios incorporated age, gender, symptoms, vital signs, physical exam findings, laboratory values, medical and surgical histories, and current medication regimens as data inputs. All collected data were entered into a table for each diagnosis. These tables were then used for scenario creation, with scenarios written to reflect typical patient presentations for each diagnosis. Finally, each scenario was entered into ChatGPT (version 3.5) individually, with ChatGPT then being asked to provide the leading diagnosis for the condition based on the provided information. The output from ChatGPT was then compared to the expected diagnosis to assess the accuracy. A statistically significant difference between ChatGPT's diagnostic outcomes and clinical diagnoses for acute cholecystitis and diverticulitis was observed, with ChatGPT demonstrating inferior accuracy in controlled test scenarios. A secondary outcome analysis looked at the relationship between specific symptoms and diagnosis. The presence of these symptoms in incorrect diagnoses indicates that they may adversely impact ChatGPT's diagnostic decision-making, resulting in a higher likelihood of misdiagnosis. These results highlight AI's potential as a diagnostic support tool but underscore the importance of continued research to evaluate its performance in more complex and varied clinical scenarios. In summary, this study evaluated the diagnostic accuracy of ChatGPT in identifying three common surgical conditions (acute appendicitis, acute cholecystitis, and diverticulitis) using comprehensive patient data, including age, gender, medical history, medications, symptoms, vital signs, physical exam findings, and basic laboratory results. The hypothesis was that ChatGPT might display slightly lower accuracy rates than clinical diagnoses made by medical providers. The statistical analysis, which included Fisher's exact test, revealed a significant difference between ChatGPT's diagnostic outcomes and clinical diagnoses, particularly in acute cholecystitis and diverticulitis cases. Therefore, we reject the null hypothesis, as the results indicated that ChatGPT's diagnostic accuracy significantly differs from clinical diagnostics in the presented scenarios. However, ChatGPT's overall high accuracy suggests that it can reliably support clinicians, especially in environments where diagnostic resources are limited, and can serve as a valuable tool in military medicine.Reference Key |
meier2025comparing
Use this key to autocite in the manuscript while using
SciMatic Manuscript Manager or Thesis Manager
|
---|---|
Authors | Meier, Heather; McMahon, Ryan; Hout, Brittany; Randles, Joshua; Aden, James; Rizzo, Julie A |
Journal | Military Medicine |
Year | 2025 |
DOI | 10.1093/milmed/usaf168 |
URL | |
Keywords | Keywords not found |
Citations
No citations found. To add a citation, contact the admin at info@scimatic.org
Comments
No comments yet. Be the first to comment on this article.