Human-machine collaboration for rapid speech transcription
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2007.
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | eng |
Published: |
Massachusetts Institute of Technology
2008
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/41751 |
_version_ | 1826201749964193792 |
---|---|
author | Roy, Brandon C. (Brandon Cain) |
author2 | Deb Roy. |
author_facet | Deb Roy. Roy, Brandon C. (Brandon Cain) |
author_sort | Roy, Brandon C. (Brandon Cain) |
collection | MIT |
description | Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2007. |
first_indexed | 2024-09-23T11:56:10Z |
format | Thesis |
id | mit-1721.1/41751 |
institution | Massachusetts Institute of Technology |
language | eng |
last_indexed | 2024-09-23T11:56:10Z |
publishDate | 2008 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/417512019-04-11T09:24:46Z Human-machine collaboration for rapid speech transcription Roy, Brandon C. (Brandon Cain) Deb Roy. Massachusetts Institute of Technology. Dept. of Architecture. Program in Media Arts and Sciences. Massachusetts Institute of Technology. Dept. of Architecture. Program in Media Arts and Sciences. Architecture. Program in Media Arts and Sciences. Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2007. Includes bibliographical references (p. 121-127). Inexpensive storage and sensor technologies are yielding a new generation of massive multimedia datasets. The exponential growth in storage and processing power makes it possible to collect more data than ever before, yet without appropriate content annotation for search and analysis such corpora are of little use. While advances in data mining and machine learning have helped to automate some types of analysis, the need for human annotation still exists and remains expensive. The Human Speechome Project is a heavily data-driven longitudinal study of language acquisition. More than 100,000 hours of audio and video recordings have been collected over a two year period to trace one child's language development at home. A critical first step in analyzing this corpus is to obtain high quality transcripts of all speech heard and produced by the child. Unfortunately, automatic speech transcription has proven to be inadequate for these recordings, and manual transcription with existing tools is extremely labor intensive and therefore expensive. A new human-machine collaborative system for rapid speech transcription has been developed which leverages both the quality of human transcription and the speed of automatic speech processing. Machine algorithms sift through the massive dataset to find and segment speech. The results of automatic analysis are handed off to humans for transcription using newly designed tools with an optimized user interface. The automatic algorithms are tuned to optimize human performance, and errors are corrected by the human and used to iteratively improve the machine performance. When compared with other popular transcription tools, the new system is three- to six-fold faster, while preserving transcription quality. When applied to the Speechome audio corpus, over 100 hours of multitrack audio can be transcribed in about 12 hours by a single human transcriber. by Brandon C. Roy. S.M. 2008-05-19T16:14:31Z 2008-05-19T16:14:31Z 2007 2007 Thesis http://hdl.handle.net/1721.1/41751 225886023 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 127 p. application/pdf Massachusetts Institute of Technology |
spellingShingle | Architecture. Program in Media Arts and Sciences. Roy, Brandon C. (Brandon Cain) Human-machine collaboration for rapid speech transcription |
title | Human-machine collaboration for rapid speech transcription |
title_full | Human-machine collaboration for rapid speech transcription |
title_fullStr | Human-machine collaboration for rapid speech transcription |
title_full_unstemmed | Human-machine collaboration for rapid speech transcription |
title_short | Human-machine collaboration for rapid speech transcription |
title_sort | human machine collaboration for rapid speech transcription |
topic | Architecture. Program in Media Arts and Sciences. |
url | http://hdl.handle.net/1721.1/41751 |
work_keys_str_mv | AT roybrandoncbrandoncain humanmachinecollaborationforrapidspeechtranscription |