Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models

Kung; Tiffany H.;Cheatham; Morgan;Medenilla; Arielle;Sillos; Czarina;Leon; Lorie De;Elepaño; Camille;Madriaga; Maria;Aggabao; Rimel;Diaz-Candido; Giezel;Maningo; James;Tseng; Victor;

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models

Clicks: 186

ID: 276480

2023

Free PDF

Article Quality & Performance Metrics

Overall Quality Improving Quality

0.0 /100

Combines engagement data with AI-assessed academic quality

Reader Engagement Emerging Content

24.0 /100

80 views

80 readers

AI Quality Assessment

Not analyzed

Abstract

EN
- Turkish
- Spanish
- Portuguese
- Arabic
- Chinese
- French
- German
- Indonesian
- Russian
- Thai

We evaluated the performance of a large language model called ChatGPT on the United States Medical Licensing Exam (USMLE), which consists of three exams: Step 1, Step 2CK, and Step 3. ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations. These results suggest that large language models may have the potential to assist with medical education, and potentially, clinical decision-making. Author summary Artificial intelligence (AI) systems hold great promise to improve medical care and health outcomes. As such, it is crucial to ensure that the development of clinical AI is guided by the principles of trust and explainability. Measuring AI medical knowledge in comparison to that of expert human clinicians is a critical first step in evaluating these qualities. To accomplish this, we evaluated the performance of ChatGPT, a language-based AI, on the United States Medical Licensing Exam (USMLE). The USMLE is a set of three standardized tests of expert-level knowledge, which are required for medical licensure in the United States. We found that ChatGPT performed at or near the passing threshold of 60% accuracy. Being the first to achieve this benchmark, this marks a notable milestone in AI maturation. Impressively, ChatGPT was able to achieve this result without specialized input from human trainers. Furthermore, ChatGPT displayed comprehensible reasoning and valid clinical insights, lending increased confidence to trust and explainability. Our study suggests that large language models such as ChatGPT may potentially assist human learners in a medical education setting, as a prelude to future integration into clinical decision-making.

Reference Key	kung2023performanceplos Use this key to autocite in the manuscript while using SciMatic Manuscript Manager or Thesis Manager
Authors	Kung, Tiffany H.;Cheatham, Morgan;Medenilla, Arielle;Sillos, Czarina;Leon, Lorie De;Elepaño, Camille;Madriaga, Maria;Aggabao, Rimel;Diaz-Candido, Giezel;Maningo, James;Tseng, Victor;
Journal	PLOS Digital Health
Year	2023
DOI	DOI not found Searching for DOI...
URL	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9931230/?tool=EBI
Keywords	Education Biology (General) Medicine (General) Medicine Information technology Biotechnology Computer applications to medicine. Medical informatics environmental effects of industries and plants renewable energy sources environmental sciences special aspects of education political science (general) astronomy geophysics. cosmic physics political institutions and public administration - asia (asian studies only)

Citations

No citations found. To add a citation, contact the admin at info@scimatic.org

Comments

Login to comment Register

No comments yet. Be the first to comment on this article.