Assessing the usefulness of google books’ word frequencies for psycholinguistic research on word processing

In this Perspective Article we assess the usefulness of Google’s new word frequencies for word recognition research (lexical decision and word naming). We find that, despite the massive corpus on which the Google estimates are based (131 billion words from books published in the United States alone)...

Full description

Bibliographic Details
Main Authors:	Marc eBrysbaert, Emmanuel eKeuleers, Boris eNew
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2011-03-01
Series:	Frontiers in Psychology
Subjects:	word recognition Google Books ngrams lexical decision subtitle word frequencies subtlex word frequency
Online Access:	http://journal.frontiersin.org/Journal/10.3389/fpsyg.2011.00027/full

_version_	1819157640530362368
author	Marc eBrysbaert Emmanuel eKeuleers Boris eNew
author_facet	Marc eBrysbaert Emmanuel eKeuleers Boris eNew
author_sort	Marc eBrysbaert
collection	DOAJ
description	In this Perspective Article we assess the usefulness of Google’s new word frequencies for word recognition research (lexical decision and word naming). We find that, despite the massive corpus on which the Google estimates are based (131 billion words from books published in the United States alone), the Google American English frequencies explain 11% less of the variance in the lexical decision times from the English Lexicon Project (Balota et al., 2007) than the SUBTLEX-US word frequencies, based on a corpus of 51 million words from film and television subtitles. Further analyses indicate that word frequencies derived from recent books (published after 2000) are better predictors of word processing times than frequencies based on the full corpus, and that word frequencies based on fiction books predict word processing times better than word frequencies based on the full corpus. The most predictive word frequencies from Google still do not explain more of the variance in word recognition times of undergraduate students and old adults than the subtitle-based word frequencies.
first_indexed	2024-12-22T16:11:59Z
format	Article
id	doaj.art-f3567c0e9e49480b88fa36a842af1e11
institution	Directory Open Access Journal
issn	1664-1078
language	English
last_indexed	2024-12-22T16:11:59Z
publishDate	2011-03-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Psychology
spelling	doaj.art-f3567c0e9e49480b88fa36a842af1e112022-12-21T18:20:29ZengFrontiers Media S.A.Frontiers in Psychology1664-10782011-03-01210.3389/fpsyg.2011.000279569Assessing the usefulness of google books’ word frequencies for psycholinguistic research on word processingMarc eBrysbaert0Emmanuel eKeuleers1Boris eNew2Ghent UniversityGhent UniversityUniversité Paris Descartes, CNRS, FranceIn this Perspective Article we assess the usefulness of Google’s new word frequencies for word recognition research (lexical decision and word naming). We find that, despite the massive corpus on which the Google estimates are based (131 billion words from books published in the United States alone), the Google American English frequencies explain 11% less of the variance in the lexical decision times from the English Lexicon Project (Balota et al., 2007) than the SUBTLEX-US word frequencies, based on a corpus of 51 million words from film and television subtitles. Further analyses indicate that word frequencies derived from recent books (published after 2000) are better predictors of word processing times than frequencies based on the full corpus, and that word frequencies based on fiction books predict word processing times better than word frequencies based on the full corpus. The most predictive word frequencies from Google still do not explain more of the variance in word recognition times of undergraduate students and old adults than the subtitle-based word frequencies.http://journal.frontiersin.org/Journal/10.3389/fpsyg.2011.00027/fullword recognitionGoogle Books ngramslexical decisionsubtitle word frequenciessubtlexword frequency
spellingShingle	Marc eBrysbaert Emmanuel eKeuleers Boris eNew Assessing the usefulness of google books’ word frequencies for psycholinguistic research on word processing Frontiers in Psychology word recognition Google Books ngrams lexical decision subtitle word frequencies subtlex word frequency
title	Assessing the usefulness of google books’ word frequencies for psycholinguistic research on word processing
title_full	Assessing the usefulness of google books’ word frequencies for psycholinguistic research on word processing
title_fullStr	Assessing the usefulness of google books’ word frequencies for psycholinguistic research on word processing
title_full_unstemmed	Assessing the usefulness of google books’ word frequencies for psycholinguistic research on word processing
title_short	Assessing the usefulness of google books’ word frequencies for psycholinguistic research on word processing
title_sort	assessing the usefulness of google books word frequencies for psycholinguistic research on word processing
topic	word recognition Google Books ngrams lexical decision subtitle word frequencies subtlex word frequency
url	http://journal.frontiersin.org/Journal/10.3389/fpsyg.2011.00027/full
work_keys_str_mv	AT marcebrysbaert assessingtheusefulnessofgooglebookswordfrequenciesforpsycholinguisticresearchonwordprocessing AT emmanuelekeuleers assessingtheusefulnessofgooglebookswordfrequenciesforpsycholinguisticresearchonwordprocessing AT borisenew assessingtheusefulnessofgooglebookswordfrequenciesforpsycholinguisticresearchonwordprocessing

Assessing the usefulness of google books’ word frequencies for psycholinguistic research on word processing

Similar Items