The Challenges of Large‐Scale, Web‐Based Language Datasets: Word Length and Predictability Revisited

The Challenges of Large‐Scale, Web‐Based Language Datasets: Word Length and Predictability Revisited

Bibliographic Details
Main Authors:	Meylan, Stephan C., Griffiths, Thomas L.
Other Authors:	Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences
Format:	Article
Language:	English
Published:	Wiley 2022
Online Access:	https://hdl.handle.net/1721.1/140541

Similar Items

Tone and word length across languages
by: Søren Wichmann
Published: (2023-06-01)

Dataset of Karakalpak language stop words
by: Khabibulla Madatov, et al.
Published: (2023-06-01)

Word-length algorithm for language identification of under-resourced languages
by: Selamat, A., et al.
Published: (2016)

Large-scale evidence of dependency length minimization in 37 languages
by: Futrell, Richard Landy Jones, et al.
Published: (2016)

MyWSL: Malaysian words sign language dataset
by: Rina Tasia Johari, et al.
Published: (2023-08-01)

Word Embedding Based on Large-Scale Web Corpora as a Powerful Lexicographic Tool
by: Radovan Garabík
Published: (2020-01-01)

Languages Support Efficient Communication about the Environment: Words for Snow Revisited.
by: Terry Regier, et al.
Published: (2016-01-01)

Challenges of Large-Scale Multi-Camera Datasets for Driver Monitoring Systems
by: Juan Diego Ortega, et al.
Published: (2022-03-01)

Peekbank: An open, large-scale repository for developmental eye-tracking data of children’s word recognition
by: Zettersten, Martin, et al.
Published: (2023)

Enhanced word length and model elimination algorithms for language identification /
by: Akosu, Nicholas Iornongu, 1962-, author 577027, et al.
Published: (2014)

Enhanced word length and model elimination algorithms for language identification
by: Akosu, Nicholas Iornongu
Published: (2014)

Large vario-scale datasets
by: Radan Šuba
Published: (2018-12-01)

Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited
by: Łukasz Dębowski
Published: (2018-01-01)

Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity
by: Juan J. Lastra-Díaz, et al.
Published: (2019-10-01)

Papyrus: a large-scale curated dataset aimed at bioactivity predictions
by: O. J. M. Béquignon, et al.
Published: (2023-01-01)

The Blackbird Dataset: A Large-Scale Dataset for UAV Perception in Aggressive Flight
by: Antonini, Amado, et al.
Published: (2022)

Visual analytics for large-scale datasets
by: Gani, Reinaldo
Published: (2018)

Word lengths are optimized for efficient communication
by: Piantadosi, Steven Thomas, et al.
Published: (2011)

Image Appeal Revisited: Analysis, New Dataset, and Prediction Models
by: Steve Goring, et al.
Published: (2023-01-01)

Word-Length Correlations and Memory in Large Texts: A Visibility Network Analysis
by: Lev Guzmán-Vargas, et al.
Published: (2015-11-01)

Enhanced word length and model elimination algorithms for language identification [electronic resource] /
by: Akosu, Nicholas Iornongu, 1962-, author 577027
Published: (2014)

Large Language Model Routing with Benchmark Datasets
by: Ou, Anthony C.
Published: (2024)

Words on words : a language reader /
by: 210755 Finnie, W. Bruce, et al.
Published: (1971)

How Word/Non-Word Length Influence Reading Acquisition in a Transparent Language: Implications for Children’s Literacy and Development
by: Aparecido J. C. Soares, et al.
Published: (2022-12-01)

Towards algorithmic analytics for large-scale datasets
by: Bzdok, D, et al.
Published: (2019)

BdSLW-11: Dataset of Bangladeshi sign language words for recognizing 11 daily useful BdSL words
by: Md. Monirul Islam, et al.
Published: (2022-12-01)

Ostwald ripening: The screening length revisited
by: Niethammer, B, et al.
Published: (2001)

Words on words : quotations about language and languages /
by: 173256 Crystal, David, et al.
Published: (2000)

Sign Language Recognition Using Graph and General Deep Neural Network Based on Large Scale Dataset
by: Abu Saleh Musa Miah, et al.
Published: (2024-01-01)

Large-scale filamentary structures around the Virgo Cluster revisited
by: Bureau, M, et al.
Published: (2016)

Determinization and Minimization of Automata for Nested Words Revisited
by: Joachim Niehren, et al.
Published: (2021-02-01)

On the length of shortest 2-collapsing words
by: Alessandra Cherubini, et al.
Published: (2009-01-01)

Cross-linguistic conditions on word length.
by: Søren Wichmann, et al.
Published: (2023-01-01)

Revisiting gender-fair language and stereotypes – A comparison of word pairs, capital I forms and the asterisk
by: Schunack Silke, et al.
Published: (2022-11-01)

Vggsound: a large-scale audio-visual dataset
by: Chen, H, et al.
Published: (2020)

Comprehensive comparison of large-scale tissue expression datasets
by: Alberto Santos, et al.
Published: (2015-06-01)

A web of words
by: Blench, R, et al.
Published: (1989)

Prediction of Intensive Care Unit Length of Stay in the MIMIC-IV Dataset
by: Lars Hempel, et al.
Published: (2023-06-01)

Boundedness in languages of infinite words
by: Mikołaj Bojańczyk, et al.
Published: (2017-10-01)

Words on the Web / Web of Words. Possible educational uses of digitised historical dictionaries
by: Marco Biffi, et al.
Published: (2023-02-01)