Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks
Cutting-edge data science techniques can shed new light on fundamental questions in educational research. We apply techniques from natural language processing (lexicons, word embeddings, topic models) to 15 U.S. history textbooks widely used in Texas between 2015 and 2017, studying their depiction o...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SAGE Publishing
2020-07-01
|
Series: | AERA Open |
Online Access: | https://doi.org/10.1177/2332858420940312 |
_version_ | 1818415031344168960 |
---|---|
author | Li Lucy Dorottya Demszky Patricia Bromley Dan Jurafsky |
author_facet | Li Lucy Dorottya Demszky Patricia Bromley Dan Jurafsky |
author_sort | Li Lucy |
collection | DOAJ |
description | Cutting-edge data science techniques can shed new light on fundamental questions in educational research. We apply techniques from natural language processing (lexicons, word embeddings, topic models) to 15 U.S. history textbooks widely used in Texas between 2015 and 2017, studying their depiction of historically marginalized groups. We find that Latinx people are rarely discussed, and the most common famous figures are nearly all White men. Lexicon-based approaches show that Black people are described as performing actions associated with low agency and power. Word embeddings reveal that women tend to be discussed in the contexts of work and the home. Topic modeling highlights the higher prominence of political topics compared with social ones. We also find that more conservative counties tend to purchase textbooks with less representation of women and Black people. Building on a rich tradition of textbook analysis, we release our computational toolkit to support new research directions. |
first_indexed | 2024-12-14T11:28:31Z |
format | Article |
id | doaj.art-08d04c382b95435d908bfb94302a5b66 |
institution | Directory Open Access Journal |
issn | 2332-8584 |
language | English |
last_indexed | 2024-12-14T11:28:31Z |
publishDate | 2020-07-01 |
publisher | SAGE Publishing |
record_format | Article |
series | AERA Open |
spelling | doaj.art-08d04c382b95435d908bfb94302a5b662022-12-21T23:03:25ZengSAGE PublishingAERA Open2332-85842020-07-01610.1177/2332858420940312Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History TextbooksLi LucyDorottya DemszkyPatricia BromleyDan JurafskyCutting-edge data science techniques can shed new light on fundamental questions in educational research. We apply techniques from natural language processing (lexicons, word embeddings, topic models) to 15 U.S. history textbooks widely used in Texas between 2015 and 2017, studying their depiction of historically marginalized groups. We find that Latinx people are rarely discussed, and the most common famous figures are nearly all White men. Lexicon-based approaches show that Black people are described as performing actions associated with low agency and power. Word embeddings reveal that women tend to be discussed in the contexts of work and the home. Topic modeling highlights the higher prominence of political topics compared with social ones. We also find that more conservative counties tend to purchase textbooks with less representation of women and Black people. Building on a rich tradition of textbook analysis, we release our computational toolkit to support new research directions.https://doi.org/10.1177/2332858420940312 |
spellingShingle | Li Lucy Dorottya Demszky Patricia Bromley Dan Jurafsky Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks AERA Open |
title | Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks |
title_full | Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks |
title_fullStr | Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks |
title_full_unstemmed | Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks |
title_short | Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks |
title_sort | content analysis of textbooks via natural language processing findings on gender race and ethnicity in texas u s history textbooks |
url | https://doi.org/10.1177/2332858420940312 |
work_keys_str_mv | AT lilucy contentanalysisoftextbooksvianaturallanguageprocessingfindingsongenderraceandethnicityintexasushistorytextbooks AT dorottyademszky contentanalysisoftextbooksvianaturallanguageprocessingfindingsongenderraceandethnicityintexasushistorytextbooks AT patriciabromley contentanalysisoftextbooksvianaturallanguageprocessingfindingsongenderraceandethnicityintexasushistorytextbooks AT danjurafsky contentanalysisoftextbooksvianaturallanguageprocessingfindingsongenderraceandethnicityintexasushistorytextbooks |