Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy

Abstract Hypothyroidism is characterized by thyroid hormone deficiency and has adverse effects on both pregnancy and fetal health. Chat Generative Pre-trained Transformer (ChatGPT) is a large language model trained with a very large database from many sources. Our study was aimed to evaluate the rel...

Full description

Bibliographic Details
Main Authors:	C. E. Onder, G. Koc, P. Gokbulut, I. Taskaldiran, S. M. Kuskonmaz
Format:	Article
Language:	English
Published:	Nature Portfolio 2024-01-01
Series:	Scientific Reports
Online Access:	https://doi.org/10.1038/s41598-023-50884-w

_version_	1827388509072130048
author	C. E. Onder G. Koc P. Gokbulut I. Taskaldiran S. M. Kuskonmaz
author_facet	C. E. Onder G. Koc P. Gokbulut I. Taskaldiran S. M. Kuskonmaz
author_sort	C. E. Onder
collection	DOAJ
description	Abstract Hypothyroidism is characterized by thyroid hormone deficiency and has adverse effects on both pregnancy and fetal health. Chat Generative Pre-trained Transformer (ChatGPT) is a large language model trained with a very large database from many sources. Our study was aimed to evaluate the reliability and readability of ChatGPT-4 answers about hypothyroidism in pregnancy. A total of 19 questions were created in line with the recommendations in the latest guideline of the American Thyroid Association (ATA) on hypothyroidism in pregnancy and were asked to ChatGPT-4. The reliability and quality of the responses were scored by two independent researchers using the global quality scale (GQS) and modified DISCERN tools. The readability of ChatGPT was assessed used Flesch Reading Ease (FRE) Score, Flesch-Kincaid grade level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), and Simple Measure of Gobbledygook (SMOG) tools. No misleading information was found in any of the answers. The mean mDISCERN score of the responses was 30.26 ± 3.14; the median GQS score was 4 (2–4). In terms of reliability, most of the answers showed moderate (78.9%) followed by good (21.1%) reliability. In the readability analysis, the median FRE was 32.20 (13.00–37.10). The years of education required to read the answers were mostly found at the university level [9 (47.3%)]. Although ChatGPT-4 has significant potential, it can be used as an auxiliary information source for counseling by creating a bridge between patients and clinicians about hypothyroidism in pregnancy. Efforts should be made to improve the reliability and readability of ChatGPT.
first_indexed	2024-03-08T16:20:48Z
format	Article
id	doaj.art-cf05fd3fad374273985631cfef83c1fb
institution	Directory Open Access Journal
issn	2045-2322
language	English
last_indexed	2024-03-08T16:20:48Z
publishDate	2024-01-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj.art-cf05fd3fad374273985631cfef83c1fb2024-01-07T12:24:51ZengNature PortfolioScientific Reports2045-23222024-01-011411810.1038/s41598-023-50884-wEvaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancyC. E. Onder0G. Koc1P. Gokbulut2I. Taskaldiran3S. M. Kuskonmaz4Department of Endocrinology and Metabolic Diseases, Ankara Training and Research HospitalDepartment of Endocrinology and Metabolic Diseases, Ankara Training and Research HospitalDepartment of Endocrinology and Metabolic Diseases, Ankara Training and Research HospitalDepartment of Endocrinology and Metabolic Diseases, Ankara Training and Research HospitalDepartment of Endocrinology and Metabolic Diseases, Ankara Training and Research HospitalAbstract Hypothyroidism is characterized by thyroid hormone deficiency and has adverse effects on both pregnancy and fetal health. Chat Generative Pre-trained Transformer (ChatGPT) is a large language model trained with a very large database from many sources. Our study was aimed to evaluate the reliability and readability of ChatGPT-4 answers about hypothyroidism in pregnancy. A total of 19 questions were created in line with the recommendations in the latest guideline of the American Thyroid Association (ATA) on hypothyroidism in pregnancy and were asked to ChatGPT-4. The reliability and quality of the responses were scored by two independent researchers using the global quality scale (GQS) and modified DISCERN tools. The readability of ChatGPT was assessed used Flesch Reading Ease (FRE) Score, Flesch-Kincaid grade level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), and Simple Measure of Gobbledygook (SMOG) tools. No misleading information was found in any of the answers. The mean mDISCERN score of the responses was 30.26 ± 3.14; the median GQS score was 4 (2–4). In terms of reliability, most of the answers showed moderate (78.9%) followed by good (21.1%) reliability. In the readability analysis, the median FRE was 32.20 (13.00–37.10). The years of education required to read the answers were mostly found at the university level [9 (47.3%)]. Although ChatGPT-4 has significant potential, it can be used as an auxiliary information source for counseling by creating a bridge between patients and clinicians about hypothyroidism in pregnancy. Efforts should be made to improve the reliability and readability of ChatGPT.https://doi.org/10.1038/s41598-023-50884-w
spellingShingle	C. E. Onder G. Koc P. Gokbulut I. Taskaldiran S. M. Kuskonmaz Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy Scientific Reports
title	Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy
title_full	Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy
title_fullStr	Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy
title_full_unstemmed	Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy
title_short	Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy
title_sort	evaluation of the reliability and readability of chatgpt 4 responses regarding hypothyroidism during pregnancy
url	https://doi.org/10.1038/s41598-023-50884-w
work_keys_str_mv	AT ceonder evaluationofthereliabilityandreadabilityofchatgpt4responsesregardinghypothyroidismduringpregnancy AT gkoc evaluationofthereliabilityandreadabilityofchatgpt4responsesregardinghypothyroidismduringpregnancy AT pgokbulut evaluationofthereliabilityandreadabilityofchatgpt4responsesregardinghypothyroidismduringpregnancy AT itaskaldiran evaluationofthereliabilityandreadabilityofchatgpt4responsesregardinghypothyroidismduringpregnancy AT smkuskonmaz evaluationofthereliabilityandreadabilityofchatgpt4responsesregardinghypothyroidismduringpregnancy

Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy

Similar Items