Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks

Cutting-edge data science techniques can shed new light on fundamental questions in educational research. We apply techniques from natural language processing (lexicons, word embeddings, topic models) to 15 U.S. history textbooks widely used in Texas between 2015 and 2017, studying their depiction o...

Full description

Bibliographic Details
Main Authors: Li Lucy, Dorottya Demszky, Patricia Bromley, Dan Jurafsky
Format: Article
Language:English
Published: SAGE Publishing 2020-07-01
Series:AERA Open
Online Access:https://doi.org/10.1177/2332858420940312
_version_ 1818415031344168960
author Li Lucy
Dorottya Demszky
Patricia Bromley
Dan Jurafsky
author_facet Li Lucy
Dorottya Demszky
Patricia Bromley
Dan Jurafsky
author_sort Li Lucy
collection DOAJ
description Cutting-edge data science techniques can shed new light on fundamental questions in educational research. We apply techniques from natural language processing (lexicons, word embeddings, topic models) to 15 U.S. history textbooks widely used in Texas between 2015 and 2017, studying their depiction of historically marginalized groups. We find that Latinx people are rarely discussed, and the most common famous figures are nearly all White men. Lexicon-based approaches show that Black people are described as performing actions associated with low agency and power. Word embeddings reveal that women tend to be discussed in the contexts of work and the home. Topic modeling highlights the higher prominence of political topics compared with social ones. We also find that more conservative counties tend to purchase textbooks with less representation of women and Black people. Building on a rich tradition of textbook analysis, we release our computational toolkit to support new research directions.
first_indexed 2024-12-14T11:28:31Z
format Article
id doaj.art-08d04c382b95435d908bfb94302a5b66
institution Directory Open Access Journal
issn 2332-8584
language English
last_indexed 2024-12-14T11:28:31Z
publishDate 2020-07-01
publisher SAGE Publishing
record_format Article
series AERA Open
spelling doaj.art-08d04c382b95435d908bfb94302a5b662022-12-21T23:03:25ZengSAGE PublishingAERA Open2332-85842020-07-01610.1177/2332858420940312Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History TextbooksLi LucyDorottya DemszkyPatricia BromleyDan JurafskyCutting-edge data science techniques can shed new light on fundamental questions in educational research. We apply techniques from natural language processing (lexicons, word embeddings, topic models) to 15 U.S. history textbooks widely used in Texas between 2015 and 2017, studying their depiction of historically marginalized groups. We find that Latinx people are rarely discussed, and the most common famous figures are nearly all White men. Lexicon-based approaches show that Black people are described as performing actions associated with low agency and power. Word embeddings reveal that women tend to be discussed in the contexts of work and the home. Topic modeling highlights the higher prominence of political topics compared with social ones. We also find that more conservative counties tend to purchase textbooks with less representation of women and Black people. Building on a rich tradition of textbook analysis, we release our computational toolkit to support new research directions.https://doi.org/10.1177/2332858420940312
spellingShingle Li Lucy
Dorottya Demszky
Patricia Bromley
Dan Jurafsky
Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks
AERA Open
title Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks
title_full Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks
title_fullStr Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks
title_full_unstemmed Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks
title_short Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks
title_sort content analysis of textbooks via natural language processing findings on gender race and ethnicity in texas u s history textbooks
url https://doi.org/10.1177/2332858420940312
work_keys_str_mv AT lilucy contentanalysisoftextbooksvianaturallanguageprocessingfindingsongenderraceandethnicityintexasushistorytextbooks
AT dorottyademszky contentanalysisoftextbooksvianaturallanguageprocessingfindingsongenderraceandethnicityintexasushistorytextbooks
AT patriciabromley contentanalysisoftextbooksvianaturallanguageprocessingfindingsongenderraceandethnicityintexasushistorytextbooks
AT danjurafsky contentanalysisoftextbooksvianaturallanguageprocessingfindingsongenderraceandethnicityintexasushistorytextbooks