Article Text

Download PDFPDF
Performance and risks of ChatGPT used in drug information: an exploratory real-world analysis
  1. Benedict Morath,
  2. Ute Chiriac,
  3. Elena Jaszkowski,
  4. Carolin Deiß,
  5. Hannah Nürnberg,
  6. Katrin Hörth,
  7. Torsten Hoppe-Tichy,
  8. Kim Green
  1. Hospital Pharmacy, Heidelberg University Hospital, Heidelberg, Germany
  1. Correspondence to Benedict Morath, Hospital Pharmacy, Heidelberg University Hospital, Heidelberg, Germany; benedict.morath{at}med.uni-heidelberg.de

Abstract

Objectives To investigate the performance and risk associated with the usage of Chat Generative Pre-trained Transformer (ChatGPT) to answer drug-related questions.

Methods A sample of 50 drug-related questions were consecutively collected and entered in the artificial intelligence software application ChatGPT. Answers were documented and rated in a standardised consensus process by six senior hospital pharmacists in the domains content (correct, incomplete, false), patient management (possible, insufficient, not possible) and risk (no risk, low risk, high risk). As reference, answers were researched in adherence to the German guideline of drug information and stratified in four categories according to the sources used. In addition, the reproducibility of ChatGPT’s answers was analysed by entering three questions at different timepoints repeatedly (day 1, day 2, week 2, week 3).

Results Overall, only 13 of 50 answers provided correct content and had enough information to initiate management with no risk of patient harm. The majority of answers were either false (38%, n=19) or had partly correct content (36%, n=18) and no references were provided. A high risk of patient harm was likely in 26% (n=13) of the cases and risk was judged low for 28% (n=14) of the cases. In all high-risk cases, actions could have been initiated based on the provided information. The answers of ChatGPT varied over time when entered repeatedly and only three out of 12 answers were identical, showing no reproducibility to low reproducibility.

Conclusion In a real-world sample of 50 drug-related questions, ChatGPT answered the majority of questions wrong or partly wrong. The use of artificial intelligence applications in drug information is not possible as long as barriers like wrong content, missing references and reproducibility remain.

  • EVIDENCE-BASED MEDICINE
  • PHARMACY SERVICE, HOSPITAL
  • HEALTH SERVICES ADMINISTRATION
  • Medical Informatics
  • JOURNALISM, MEDICAL

Data availability statement

Data are available upon reasonable request. N/A.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.