Types of Errors Hiding in Google Scholar Data

Google Scholar (GS) is a free tool that may be used by researchers to analyze citations; find appropriate literature; or evaluate the quality of an author or a contender for tenure, promotion, a faculty position, funding, or research grants. GS has become a major bibliographic and citatio...

Full description

Bibliographic Details
Main Author: Romy Sauvayre
Format: Article
Language:English
Published: JMIR Publications 2022-05-01
Series:Journal of Medical Internet Research
Online Access:https://www.jmir.org/2022/5/e28354
_version_ 1797735036750921728
author Romy Sauvayre
author_facet Romy Sauvayre
author_sort Romy Sauvayre
collection DOAJ
description Google Scholar (GS) is a free tool that may be used by researchers to analyze citations; find appropriate literature; or evaluate the quality of an author or a contender for tenure, promotion, a faculty position, funding, or research grants. GS has become a major bibliographic and citation database. For assessing the literature, databases, such as PubMed, PsycINFO, Scopus, and Web of Science, can be used in place of GS because they are more reliable. The aim of this study was to examine the accuracy of citation data collected from GS and provide a comprehensive description of the errors and miscounts identified. For this purpose, 281 documents that cited 2 specific works were retrieved via Publish or Perish software (PoP) and were examined. This work studied the false-positive issue inherent in the analysis of neuroimaging data. The results revealed an unprecedented error rate, with 279 of 281 (99.3%) examined references containing at least one error. Nonacademic documents tended to contain more errors than academic publications (U=5117.0; P<.001). This viewpoint article, based on a case study examining GS data accuracy, shows that GS data not only fail to be accurate but also potentially expose researchers, who would use these data without verification, to substantial biases in their analyses and results. Further work must be conducted to assess the consequences of using GS data extracted by PoP.
first_indexed 2024-03-12T12:52:10Z
format Article
id doaj.art-0c48c625dd874a50a83ff7fedf2df9e6
institution Directory Open Access Journal
issn 1438-8871
language English
last_indexed 2024-03-12T12:52:10Z
publishDate 2022-05-01
publisher JMIR Publications
record_format Article
series Journal of Medical Internet Research
spelling doaj.art-0c48c625dd874a50a83ff7fedf2df9e62023-08-28T21:51:54ZengJMIR PublicationsJournal of Medical Internet Research1438-88712022-05-01245e2835410.2196/28354Types of Errors Hiding in Google Scholar DataRomy Sauvayrehttps://orcid.org/0000-0003-0806-6234 Google Scholar (GS) is a free tool that may be used by researchers to analyze citations; find appropriate literature; or evaluate the quality of an author or a contender for tenure, promotion, a faculty position, funding, or research grants. GS has become a major bibliographic and citation database. For assessing the literature, databases, such as PubMed, PsycINFO, Scopus, and Web of Science, can be used in place of GS because they are more reliable. The aim of this study was to examine the accuracy of citation data collected from GS and provide a comprehensive description of the errors and miscounts identified. For this purpose, 281 documents that cited 2 specific works were retrieved via Publish or Perish software (PoP) and were examined. This work studied the false-positive issue inherent in the analysis of neuroimaging data. The results revealed an unprecedented error rate, with 279 of 281 (99.3%) examined references containing at least one error. Nonacademic documents tended to contain more errors than academic publications (U=5117.0; P<.001). This viewpoint article, based on a case study examining GS data accuracy, shows that GS data not only fail to be accurate but also potentially expose researchers, who would use these data without verification, to substantial biases in their analyses and results. Further work must be conducted to assess the consequences of using GS data extracted by PoP.https://www.jmir.org/2022/5/e28354
spellingShingle Romy Sauvayre
Types of Errors Hiding in Google Scholar Data
Journal of Medical Internet Research
title Types of Errors Hiding in Google Scholar Data
title_full Types of Errors Hiding in Google Scholar Data
title_fullStr Types of Errors Hiding in Google Scholar Data
title_full_unstemmed Types of Errors Hiding in Google Scholar Data
title_short Types of Errors Hiding in Google Scholar Data
title_sort types of errors hiding in google scholar data
url https://www.jmir.org/2022/5/e28354
work_keys_str_mv AT romysauvayre typesoferrorshidingingooglescholardata