Audio Segmenting and Natural Language Processing in Oral History Archiving

Traditional archives preserve physical historical records, documents, artifacts, etc. and tell a story of some historical significance. As the digital age progresses, digital archives have become more commonplace and have given wider access to archival resources and knowledge to the general public....

Full description

Bibliographic Details
Main Author: Rieping, Holly Anne
Other Authors: Fendt, Kurt E.
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/143185
_version_ 1826197906969853952
author Rieping, Holly Anne
author2 Fendt, Kurt E.
author_facet Fendt, Kurt E.
Rieping, Holly Anne
author_sort Rieping, Holly Anne
collection MIT
description Traditional archives preserve physical historical records, documents, artifacts, etc. and tell a story of some historical significance. As the digital age progresses, digital archives have become more commonplace and have given wider access to archival resources and knowledge to the general public. With wider access, historically marginalized groups now have the means to share stories that have typically been excluded from the dominant discourse. As a result, we are faced with both the challenge and the opportunity to tell and preserve stories from these groups and foreground diverse voices in these digital archives. Additionally, we are faced with the challenge of having an abundance of materials, both digitized and born digital, to use in an archive, and can utilize various computational methods to assist in the curatorial process of a digital archive by organizing the materials or finding connections between different materials that would otherwise take hundreds of hours for an archivist to do. Using materials from the MIT Black Oral History Project, this thesis first explores ways to process digitized audio interviews through audio segmentation, using techniques including silence detection and speaker diarization, with the goal of creating a more flexible way to explore interviews in a digital oral history archive. Second, this thesis uses named entity recognition to experiment with metadata extraction for an archive. Next, this thesis explores ways to discover connections between segments of interviews by using topic modeling with LDA and LSI and topic classification using machine learning methods to identify topics, similarities, and dissimilarities across interviews. Finally, this thesis discusses how these computational methods may enhance the telling of diverse stories in digital oral history archives.
first_indexed 2024-09-23T10:55:58Z
format Thesis
id mit-1721.1/143185
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T10:55:58Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1431852022-06-16T03:38:48Z Audio Segmenting and Natural Language Processing in Oral History Archiving Rieping, Holly Anne Fendt, Kurt E. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Traditional archives preserve physical historical records, documents, artifacts, etc. and tell a story of some historical significance. As the digital age progresses, digital archives have become more commonplace and have given wider access to archival resources and knowledge to the general public. With wider access, historically marginalized groups now have the means to share stories that have typically been excluded from the dominant discourse. As a result, we are faced with both the challenge and the opportunity to tell and preserve stories from these groups and foreground diverse voices in these digital archives. Additionally, we are faced with the challenge of having an abundance of materials, both digitized and born digital, to use in an archive, and can utilize various computational methods to assist in the curatorial process of a digital archive by organizing the materials or finding connections between different materials that would otherwise take hundreds of hours for an archivist to do. Using materials from the MIT Black Oral History Project, this thesis first explores ways to process digitized audio interviews through audio segmentation, using techniques including silence detection and speaker diarization, with the goal of creating a more flexible way to explore interviews in a digital oral history archive. Second, this thesis uses named entity recognition to experiment with metadata extraction for an archive. Next, this thesis explores ways to discover connections between segments of interviews by using topic modeling with LDA and LSI and topic classification using machine learning methods to identify topics, similarities, and dissimilarities across interviews. Finally, this thesis discusses how these computational methods may enhance the telling of diverse stories in digital oral history archives. M.Eng. 2022-06-15T13:02:07Z 2022-06-15T13:02:07Z 2022-02 2022-02-22T18:31:57.384Z Thesis https://hdl.handle.net/1721.1/143185 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Rieping, Holly Anne
Audio Segmenting and Natural Language Processing in Oral History Archiving
title Audio Segmenting and Natural Language Processing in Oral History Archiving
title_full Audio Segmenting and Natural Language Processing in Oral History Archiving
title_fullStr Audio Segmenting and Natural Language Processing in Oral History Archiving
title_full_unstemmed Audio Segmenting and Natural Language Processing in Oral History Archiving
title_short Audio Segmenting and Natural Language Processing in Oral History Archiving
title_sort audio segmenting and natural language processing in oral history archiving
url https://hdl.handle.net/1721.1/143185
work_keys_str_mv AT riepinghollyanne audiosegmentingandnaturallanguageprocessinginoralhistoryarchiving