Audio Segmenting and Natural Language Processing in Oral History Archiving
Traditional archives preserve physical historical records, documents, artifacts, etc. and tell a story of some historical significance. As the digital age progresses, digital archives have become more commonplace and have given wider access to archival resources and knowledge to the general public....
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2022
|
Online Access: | https://hdl.handle.net/1721.1/143185 |
_version_ | 1826197906969853952 |
---|---|
author | Rieping, Holly Anne |
author2 | Fendt, Kurt E. |
author_facet | Fendt, Kurt E. Rieping, Holly Anne |
author_sort | Rieping, Holly Anne |
collection | MIT |
description | Traditional archives preserve physical historical records, documents, artifacts, etc. and tell a story of some historical significance. As the digital age progresses, digital archives have become more commonplace and have given wider access to archival resources and knowledge to the general public. With wider access, historically marginalized groups now have the means to share stories that have typically been excluded from the dominant discourse. As a result, we are faced with both the challenge and the opportunity to tell and preserve stories from these groups and foreground diverse voices in these digital archives. Additionally, we are faced with the challenge of having an abundance of materials, both digitized and born digital, to use in an archive, and can utilize various computational methods to assist in the curatorial process of a digital archive by organizing the materials or finding connections between different materials that would otherwise take hundreds of hours for an archivist to do.
Using materials from the MIT Black Oral History Project, this thesis first explores ways to process digitized audio interviews through audio segmentation, using techniques including silence detection and speaker diarization, with the goal of creating a more flexible way to explore interviews in a digital oral history archive. Second, this thesis uses named entity recognition to experiment with metadata extraction for an archive. Next, this thesis explores ways to discover connections between segments of interviews by using topic modeling with LDA and LSI and topic classification using machine learning methods to identify topics, similarities, and dissimilarities across interviews. Finally, this thesis discusses how these computational methods may enhance the telling of diverse stories in digital oral history archives. |
first_indexed | 2024-09-23T10:55:58Z |
format | Thesis |
id | mit-1721.1/143185 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T10:55:58Z |
publishDate | 2022 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1431852022-06-16T03:38:48Z Audio Segmenting and Natural Language Processing in Oral History Archiving Rieping, Holly Anne Fendt, Kurt E. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Traditional archives preserve physical historical records, documents, artifacts, etc. and tell a story of some historical significance. As the digital age progresses, digital archives have become more commonplace and have given wider access to archival resources and knowledge to the general public. With wider access, historically marginalized groups now have the means to share stories that have typically been excluded from the dominant discourse. As a result, we are faced with both the challenge and the opportunity to tell and preserve stories from these groups and foreground diverse voices in these digital archives. Additionally, we are faced with the challenge of having an abundance of materials, both digitized and born digital, to use in an archive, and can utilize various computational methods to assist in the curatorial process of a digital archive by organizing the materials or finding connections between different materials that would otherwise take hundreds of hours for an archivist to do. Using materials from the MIT Black Oral History Project, this thesis first explores ways to process digitized audio interviews through audio segmentation, using techniques including silence detection and speaker diarization, with the goal of creating a more flexible way to explore interviews in a digital oral history archive. Second, this thesis uses named entity recognition to experiment with metadata extraction for an archive. Next, this thesis explores ways to discover connections between segments of interviews by using topic modeling with LDA and LSI and topic classification using machine learning methods to identify topics, similarities, and dissimilarities across interviews. Finally, this thesis discusses how these computational methods may enhance the telling of diverse stories in digital oral history archives. M.Eng. 2022-06-15T13:02:07Z 2022-06-15T13:02:07Z 2022-02 2022-02-22T18:31:57.384Z Thesis https://hdl.handle.net/1721.1/143185 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Rieping, Holly Anne Audio Segmenting and Natural Language Processing in Oral History Archiving |
title | Audio Segmenting and Natural Language Processing in Oral History Archiving |
title_full | Audio Segmenting and Natural Language Processing in Oral History Archiving |
title_fullStr | Audio Segmenting and Natural Language Processing in Oral History Archiving |
title_full_unstemmed | Audio Segmenting and Natural Language Processing in Oral History Archiving |
title_short | Audio Segmenting and Natural Language Processing in Oral History Archiving |
title_sort | audio segmenting and natural language processing in oral history archiving |
url | https://hdl.handle.net/1721.1/143185 |
work_keys_str_mv | AT riepinghollyanne audiosegmentingandnaturallanguageprocessinginoralhistoryarchiving |