Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy

Abstract Hypothyroidism is characterized by thyroid hormone deficiency and has adverse effects on both pregnancy and fetal health. Chat Generative Pre-trained Transformer (ChatGPT) is a large language model trained with a very large database from many sources. Our study was aimed to evaluate the rel...

Full description

Bibliographic Details
Main Authors: C. E. Onder, G. Koc, P. Gokbulut, I. Taskaldiran, S. M. Kuskonmaz
Format: Article
Language:English
Published: Nature Portfolio 2024-01-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-50884-w
_version_ 1827388509072130048
author C. E. Onder
G. Koc
P. Gokbulut
I. Taskaldiran
S. M. Kuskonmaz
author_facet C. E. Onder
G. Koc
P. Gokbulut
I. Taskaldiran
S. M. Kuskonmaz
author_sort C. E. Onder
collection DOAJ
description Abstract Hypothyroidism is characterized by thyroid hormone deficiency and has adverse effects on both pregnancy and fetal health. Chat Generative Pre-trained Transformer (ChatGPT) is a large language model trained with a very large database from many sources. Our study was aimed to evaluate the reliability and readability of ChatGPT-4 answers about hypothyroidism in pregnancy. A total of 19 questions were created in line with the recommendations in the latest guideline of the American Thyroid Association (ATA) on hypothyroidism in pregnancy and were asked to ChatGPT-4. The reliability and quality of the responses were scored by two independent researchers using the global quality scale (GQS) and modified DISCERN tools. The readability of ChatGPT was assessed used Flesch Reading Ease (FRE) Score, Flesch-Kincaid grade level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), and Simple Measure of Gobbledygook (SMOG) tools. No misleading information was found in any of the answers. The mean mDISCERN score of the responses was 30.26 ± 3.14; the median GQS score was 4 (2–4). In terms of reliability, most of the answers showed moderate (78.9%) followed by good (21.1%) reliability. In the readability analysis, the median FRE was 32.20 (13.00–37.10). The years of education required to read the answers were mostly found at the university level [9 (47.3%)]. Although ChatGPT-4 has significant potential, it can be used as an auxiliary information source for counseling by creating a bridge between patients and clinicians about hypothyroidism in pregnancy. Efforts should be made to improve the reliability and readability of ChatGPT.
first_indexed 2024-03-08T16:20:48Z
format Article
id doaj.art-cf05fd3fad374273985631cfef83c1fb
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-03-08T16:20:48Z
publishDate 2024-01-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-cf05fd3fad374273985631cfef83c1fb2024-01-07T12:24:51ZengNature PortfolioScientific Reports2045-23222024-01-011411810.1038/s41598-023-50884-wEvaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancyC. E. Onder0G. Koc1P. Gokbulut2I. Taskaldiran3S. M. Kuskonmaz4Department of Endocrinology and Metabolic Diseases, Ankara Training and Research HospitalDepartment of Endocrinology and Metabolic Diseases, Ankara Training and Research HospitalDepartment of Endocrinology and Metabolic Diseases, Ankara Training and Research HospitalDepartment of Endocrinology and Metabolic Diseases, Ankara Training and Research HospitalDepartment of Endocrinology and Metabolic Diseases, Ankara Training and Research HospitalAbstract Hypothyroidism is characterized by thyroid hormone deficiency and has adverse effects on both pregnancy and fetal health. Chat Generative Pre-trained Transformer (ChatGPT) is a large language model trained with a very large database from many sources. Our study was aimed to evaluate the reliability and readability of ChatGPT-4 answers about hypothyroidism in pregnancy. A total of 19 questions were created in line with the recommendations in the latest guideline of the American Thyroid Association (ATA) on hypothyroidism in pregnancy and were asked to ChatGPT-4. The reliability and quality of the responses were scored by two independent researchers using the global quality scale (GQS) and modified DISCERN tools. The readability of ChatGPT was assessed used Flesch Reading Ease (FRE) Score, Flesch-Kincaid grade level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), and Simple Measure of Gobbledygook (SMOG) tools. No misleading information was found in any of the answers. The mean mDISCERN score of the responses was 30.26 ± 3.14; the median GQS score was 4 (2–4). In terms of reliability, most of the answers showed moderate (78.9%) followed by good (21.1%) reliability. In the readability analysis, the median FRE was 32.20 (13.00–37.10). The years of education required to read the answers were mostly found at the university level [9 (47.3%)]. Although ChatGPT-4 has significant potential, it can be used as an auxiliary information source for counseling by creating a bridge between patients and clinicians about hypothyroidism in pregnancy. Efforts should be made to improve the reliability and readability of ChatGPT.https://doi.org/10.1038/s41598-023-50884-w
spellingShingle C. E. Onder
G. Koc
P. Gokbulut
I. Taskaldiran
S. M. Kuskonmaz
Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy
Scientific Reports
title Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy
title_full Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy
title_fullStr Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy
title_full_unstemmed Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy
title_short Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy
title_sort evaluation of the reliability and readability of chatgpt 4 responses regarding hypothyroidism during pregnancy
url https://doi.org/10.1038/s41598-023-50884-w
work_keys_str_mv AT ceonder evaluationofthereliabilityandreadabilityofchatgpt4responsesregardinghypothyroidismduringpregnancy
AT gkoc evaluationofthereliabilityandreadabilityofchatgpt4responsesregardinghypothyroidismduringpregnancy
AT pgokbulut evaluationofthereliabilityandreadabilityofchatgpt4responsesregardinghypothyroidismduringpregnancy
AT itaskaldiran evaluationofthereliabilityandreadabilityofchatgpt4responsesregardinghypothyroidismduringpregnancy
AT smkuskonmaz evaluationofthereliabilityandreadabilityofchatgpt4responsesregardinghypothyroidismduringpregnancy