Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy
Abstract Hypothyroidism is characterized by thyroid hormone deficiency and has adverse effects on both pregnancy and fetal health. Chat Generative Pre-trained Transformer (ChatGPT) is a large language model trained with a very large database from many sources. Our study was aimed to evaluate the rel...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2024-01-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-023-50884-w |
_version_ | 1827388509072130048 |
---|---|
author | C. E. Onder G. Koc P. Gokbulut I. Taskaldiran S. M. Kuskonmaz |
author_facet | C. E. Onder G. Koc P. Gokbulut I. Taskaldiran S. M. Kuskonmaz |
author_sort | C. E. Onder |
collection | DOAJ |
description | Abstract Hypothyroidism is characterized by thyroid hormone deficiency and has adverse effects on both pregnancy and fetal health. Chat Generative Pre-trained Transformer (ChatGPT) is a large language model trained with a very large database from many sources. Our study was aimed to evaluate the reliability and readability of ChatGPT-4 answers about hypothyroidism in pregnancy. A total of 19 questions were created in line with the recommendations in the latest guideline of the American Thyroid Association (ATA) on hypothyroidism in pregnancy and were asked to ChatGPT-4. The reliability and quality of the responses were scored by two independent researchers using the global quality scale (GQS) and modified DISCERN tools. The readability of ChatGPT was assessed used Flesch Reading Ease (FRE) Score, Flesch-Kincaid grade level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), and Simple Measure of Gobbledygook (SMOG) tools. No misleading information was found in any of the answers. The mean mDISCERN score of the responses was 30.26 ± 3.14; the median GQS score was 4 (2–4). In terms of reliability, most of the answers showed moderate (78.9%) followed by good (21.1%) reliability. In the readability analysis, the median FRE was 32.20 (13.00–37.10). The years of education required to read the answers were mostly found at the university level [9 (47.3%)]. Although ChatGPT-4 has significant potential, it can be used as an auxiliary information source for counseling by creating a bridge between patients and clinicians about hypothyroidism in pregnancy. Efforts should be made to improve the reliability and readability of ChatGPT. |
first_indexed | 2024-03-08T16:20:48Z |
format | Article |
id | doaj.art-cf05fd3fad374273985631cfef83c1fb |
institution | Directory Open Access Journal |
issn | 2045-2322 |
language | English |
last_indexed | 2024-03-08T16:20:48Z |
publishDate | 2024-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj.art-cf05fd3fad374273985631cfef83c1fb2024-01-07T12:24:51ZengNature PortfolioScientific Reports2045-23222024-01-011411810.1038/s41598-023-50884-wEvaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancyC. E. Onder0G. Koc1P. Gokbulut2I. Taskaldiran3S. M. Kuskonmaz4Department of Endocrinology and Metabolic Diseases, Ankara Training and Research HospitalDepartment of Endocrinology and Metabolic Diseases, Ankara Training and Research HospitalDepartment of Endocrinology and Metabolic Diseases, Ankara Training and Research HospitalDepartment of Endocrinology and Metabolic Diseases, Ankara Training and Research HospitalDepartment of Endocrinology and Metabolic Diseases, Ankara Training and Research HospitalAbstract Hypothyroidism is characterized by thyroid hormone deficiency and has adverse effects on both pregnancy and fetal health. Chat Generative Pre-trained Transformer (ChatGPT) is a large language model trained with a very large database from many sources. Our study was aimed to evaluate the reliability and readability of ChatGPT-4 answers about hypothyroidism in pregnancy. A total of 19 questions were created in line with the recommendations in the latest guideline of the American Thyroid Association (ATA) on hypothyroidism in pregnancy and were asked to ChatGPT-4. The reliability and quality of the responses were scored by two independent researchers using the global quality scale (GQS) and modified DISCERN tools. The readability of ChatGPT was assessed used Flesch Reading Ease (FRE) Score, Flesch-Kincaid grade level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), and Simple Measure of Gobbledygook (SMOG) tools. No misleading information was found in any of the answers. The mean mDISCERN score of the responses was 30.26 ± 3.14; the median GQS score was 4 (2–4). In terms of reliability, most of the answers showed moderate (78.9%) followed by good (21.1%) reliability. In the readability analysis, the median FRE was 32.20 (13.00–37.10). The years of education required to read the answers were mostly found at the university level [9 (47.3%)]. Although ChatGPT-4 has significant potential, it can be used as an auxiliary information source for counseling by creating a bridge between patients and clinicians about hypothyroidism in pregnancy. Efforts should be made to improve the reliability and readability of ChatGPT.https://doi.org/10.1038/s41598-023-50884-w |
spellingShingle | C. E. Onder G. Koc P. Gokbulut I. Taskaldiran S. M. Kuskonmaz Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy Scientific Reports |
title | Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy |
title_full | Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy |
title_fullStr | Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy |
title_full_unstemmed | Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy |
title_short | Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy |
title_sort | evaluation of the reliability and readability of chatgpt 4 responses regarding hypothyroidism during pregnancy |
url | https://doi.org/10.1038/s41598-023-50884-w |
work_keys_str_mv | AT ceonder evaluationofthereliabilityandreadabilityofchatgpt4responsesregardinghypothyroidismduringpregnancy AT gkoc evaluationofthereliabilityandreadabilityofchatgpt4responsesregardinghypothyroidismduringpregnancy AT pgokbulut evaluationofthereliabilityandreadabilityofchatgpt4responsesregardinghypothyroidismduringpregnancy AT itaskaldiran evaluationofthereliabilityandreadabilityofchatgpt4responsesregardinghypothyroidismduringpregnancy AT smkuskonmaz evaluationofthereliabilityandreadabilityofchatgpt4responsesregardinghypothyroidismduringpregnancy |