Detecting gross alignment errors in the Spoken British National Corpus

The paper presents methods for evaluating the accuracy of alignments between transcriptions and audio recordings. The methods have been applied to the Spoken British National Corpus, which is an extensive and varied corpus of natural unscripted speech. Early results show good agreement with human ra...

Full description

Bibliographic Details
Main Authors: Baghai-Ravary, L, Grau Puerto, S, Kochanski, G
Format: Conference item
Language:English
Published: 2010
Subjects:
_version_ 1826292612505534464
author Baghai-Ravary, L
Grau Puerto, S
Kochanski, G
author_facet Baghai-Ravary, L
Grau Puerto, S
Kochanski, G
author_sort Baghai-Ravary, L
collection OXFORD
description The paper presents methods for evaluating the accuracy of alignments between transcriptions and audio recordings. The methods have been applied to the Spoken British National Corpus, which is an extensive and varied corpus of natural unscripted speech. Early results show good agreement with human ratings of alignment accuracy. The methods also provide an indication of the location of likely alignment problems; this should allow efficient manual examination of large corpora. Automatic checking of such alignments is crucial when analysing any very large corpus, since even the best current speech alignment systems will occasionally make serious errors. The methods described here use a hybrid approach based on statistics of the speech signal itself, statistics of the labels being evaluated, and statistics linking the two.
first_indexed 2024-03-07T03:17:23Z
format Conference item
id oxford-uuid:b6438388-68bb-434e-9d73-7c2d32f04557
institution University of Oxford
language English
last_indexed 2024-03-07T03:17:23Z
publishDate 2010
record_format dspace
spelling oxford-uuid:b6438388-68bb-434e-9d73-7c2d32f045572022-03-27T04:39:41ZDetecting gross alignment errors in the Spoken British National CorpusConference itemhttp://purl.org/coar/resource_type/c_5794uuid:b6438388-68bb-434e-9d73-7c2d32f04557Natural Language Processing.LinguisticsPhoneticsEnglishOxford University Research Archive - Valet2010Baghai-Ravary, LGrau Puerto, SKochanski, GThe paper presents methods for evaluating the accuracy of alignments between transcriptions and audio recordings. The methods have been applied to the Spoken British National Corpus, which is an extensive and varied corpus of natural unscripted speech. Early results show good agreement with human ratings of alignment accuracy. The methods also provide an indication of the location of likely alignment problems; this should allow efficient manual examination of large corpora. Automatic checking of such alignments is crucial when analysing any very large corpus, since even the best current speech alignment systems will occasionally make serious errors. The methods described here use a hybrid approach based on statistics of the speech signal itself, statistics of the labels being evaluated, and statistics linking the two.
spellingShingle Natural Language Processing.
Linguistics
Phonetics
Baghai-Ravary, L
Grau Puerto, S
Kochanski, G
Detecting gross alignment errors in the Spoken British National Corpus
title Detecting gross alignment errors in the Spoken British National Corpus
title_full Detecting gross alignment errors in the Spoken British National Corpus
title_fullStr Detecting gross alignment errors in the Spoken British National Corpus
title_full_unstemmed Detecting gross alignment errors in the Spoken British National Corpus
title_short Detecting gross alignment errors in the Spoken British National Corpus
title_sort detecting gross alignment errors in the spoken british national corpus
topic Natural Language Processing.
Linguistics
Phonetics
work_keys_str_mv AT baghairavaryl detectinggrossalignmenterrorsinthespokenbritishnationalcorpus
AT graupuertos detectinggrossalignmenterrorsinthespokenbritishnationalcorpus
AT kochanskig detectinggrossalignmenterrorsinthespokenbritishnationalcorpus