Profile Creation with Topic Modeling and Semantic Analysis from Conversations about COVID-19 among U.S. Older Adults

Coding of qualitative data in social science research is a process that involves categorizing individual units of data to facilitate analysis. It requires a great deal of manual labor and time to produce codes with high validity and inter-coder reliability. In an ongoing study, MIT AgeLab researc...

Full description

Bibliographic Details
Main Author: Le, Joie
Other Authors: D’Ambrosio, Lisa
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/150291
_version_ 1811088119218307072
author Le, Joie
author2 D’Ambrosio, Lisa
author_facet D’Ambrosio, Lisa
Le, Joie
author_sort Le, Joie
collection MIT
description Coding of qualitative data in social science research is a process that involves categorizing individual units of data to facilitate analysis. It requires a great deal of manual labor and time to produce codes with high validity and inter-coder reliability. In an ongoing study, MIT AgeLab researchers analyzed focus group and interview transcripts containing conversations about the impact of the COVID-19 pandemic on Black and white U.S. older adults’ preventive health behavior and healthcare use. To facilitate the qualitative coding process, we propose an approach for automated topic extraction with sentiment analysis using a natural language processing technique known as topic modeling. While automated methods for quantitative data are common, methods for qualitative data, especially focus group text, have not been rigorously explored. This thesis compares two topic modeling algorithms, LDA and GSDMM, and tests a variety of pseudo-document methods to divide the text transcripts into smaller documents. After the transcripts are split by race, COVID-19 vaccination status, and relationship to a local community, global topics and sentiment-based topics are extracted from the text and labeled by human researchers. Direct comparisons between profiles within an axis uncover differences warranting further analysis. The results produced from topic modeling can be used to derive an initial codebook pre-coding and push for the investigation of utilizing topic modeling in tandem with human coding during qualitative text analysis.
first_indexed 2024-09-23T13:56:34Z
format Thesis
id mit-1721.1/150291
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T13:56:34Z
publishDate 2023
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1502912023-04-01T03:24:56Z Profile Creation with Topic Modeling and Semantic Analysis from Conversations about COVID-19 among U.S. Older Adults Le, Joie D’Ambrosio, Lisa Coughlin, Joseph Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Coding of qualitative data in social science research is a process that involves categorizing individual units of data to facilitate analysis. It requires a great deal of manual labor and time to produce codes with high validity and inter-coder reliability. In an ongoing study, MIT AgeLab researchers analyzed focus group and interview transcripts containing conversations about the impact of the COVID-19 pandemic on Black and white U.S. older adults’ preventive health behavior and healthcare use. To facilitate the qualitative coding process, we propose an approach for automated topic extraction with sentiment analysis using a natural language processing technique known as topic modeling. While automated methods for quantitative data are common, methods for qualitative data, especially focus group text, have not been rigorously explored. This thesis compares two topic modeling algorithms, LDA and GSDMM, and tests a variety of pseudo-document methods to divide the text transcripts into smaller documents. After the transcripts are split by race, COVID-19 vaccination status, and relationship to a local community, global topics and sentiment-based topics are extracted from the text and labeled by human researchers. Direct comparisons between profiles within an axis uncover differences warranting further analysis. The results produced from topic modeling can be used to derive an initial codebook pre-coding and push for the investigation of utilizing topic modeling in tandem with human coding during qualitative text analysis. M.Eng. 2023-03-31T14:45:32Z 2023-03-31T14:45:32Z 2023-02 2023-02-27T18:43:21.157Z Thesis https://hdl.handle.net/1721.1/150291 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Le, Joie
Profile Creation with Topic Modeling and Semantic Analysis from Conversations about COVID-19 among U.S. Older Adults
title Profile Creation with Topic Modeling and Semantic Analysis from Conversations about COVID-19 among U.S. Older Adults
title_full Profile Creation with Topic Modeling and Semantic Analysis from Conversations about COVID-19 among U.S. Older Adults
title_fullStr Profile Creation with Topic Modeling and Semantic Analysis from Conversations about COVID-19 among U.S. Older Adults
title_full_unstemmed Profile Creation with Topic Modeling and Semantic Analysis from Conversations about COVID-19 among U.S. Older Adults
title_short Profile Creation with Topic Modeling and Semantic Analysis from Conversations about COVID-19 among U.S. Older Adults
title_sort profile creation with topic modeling and semantic analysis from conversations about covid 19 among u s older adults
url https://hdl.handle.net/1721.1/150291
work_keys_str_mv AT lejoie profilecreationwithtopicmodelingandsemanticanalysisfromconversationsaboutcovid19amongusolderadults