Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data
A state-of-the-art automatic speech recognition (ASR) system can often achieve high accuracy for most spoken languages of interest if a large amount of speech material can be collected and used to train a set of language-specific acoustic phone models. However, designing good ASR systems with little...
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Journal Article |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/102781 http://hdl.handle.net/10220/16448 |
_version_ | 1826109945157779456 |
---|---|
author | Siniscalchi, Sabato Marco. Lyu, Dau-Cheng. Svendsen, Torbjørn. Lee, Chin-Hui. |
author2 | School of Computer Engineering |
author_facet | School of Computer Engineering Siniscalchi, Sabato Marco. Lyu, Dau-Cheng. Svendsen, Torbjørn. Lee, Chin-Hui. |
author_sort | Siniscalchi, Sabato Marco. |
collection | NTU |
description | A state-of-the-art automatic speech recognition (ASR) system can often achieve high accuracy for most spoken languages of interest if a large amount of speech material can be collected and used to train a set of language-specific acoustic phone models. However, designing good ASR systems with little or no language-specific speech data for resource-limited languages is still a challenging research topic. As a consequence, there has been an increasing interest in exploring knowledge sharing among a large number of languages so that a universal set of acoustic phone units can be defined to work for multiple or even for all languages. This work aims at demonstrating that a recently proposed automatic speech attribute transcription framework can play a key role in designing language-universal acoustic models by sharing speech units among all target languages at the acoustic phonetic attribute level. The language-universal acoustic models are evaluated through phone recognition. It will be shown that good cross-language attribute detection and continuous phone recognition performance can be accomplished for “unseen” languages using minimal training data from the target languages to be recognized. Furthermore, a phone-based background model (PBM) approach will be presented to improve attribute detection accuracies. |
first_indexed | 2024-10-01T02:26:44Z |
format | Journal Article |
id | ntu-10356/102781 |
institution | Nanyang Technological University |
language | English |
last_indexed | 2024-10-01T02:26:44Z |
publishDate | 2013 |
record_format | dspace |
spelling | ntu-10356/1027812020-05-28T07:17:40Z Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data Siniscalchi, Sabato Marco. Lyu, Dau-Cheng. Svendsen, Torbjørn. Lee, Chin-Hui. School of Computer Engineering DRNTU::Engineering::Computer science and engineering A state-of-the-art automatic speech recognition (ASR) system can often achieve high accuracy for most spoken languages of interest if a large amount of speech material can be collected and used to train a set of language-specific acoustic phone models. However, designing good ASR systems with little or no language-specific speech data for resource-limited languages is still a challenging research topic. As a consequence, there has been an increasing interest in exploring knowledge sharing among a large number of languages so that a universal set of acoustic phone units can be defined to work for multiple or even for all languages. This work aims at demonstrating that a recently proposed automatic speech attribute transcription framework can play a key role in designing language-universal acoustic models by sharing speech units among all target languages at the acoustic phonetic attribute level. The language-universal acoustic models are evaluated through phone recognition. It will be shown that good cross-language attribute detection and continuous phone recognition performance can be accomplished for “unseen” languages using minimal training data from the target languages to be recognized. Furthermore, a phone-based background model (PBM) approach will be presented to improve attribute detection accuracies. 2013-10-10T09:15:47Z 2019-12-06T21:00:09Z 2013-10-10T09:15:47Z 2019-12-06T21:00:09Z 2011 2011 Journal Article Siniscalchi, S. M., Lyu, D. C., Svendsen, T., & Lee, C. H. (2011). Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data. IEEE transactions on audio, speech, and language processing, 20(3), 875-887. https://hdl.handle.net/10356/102781 http://hdl.handle.net/10220/16448 10.1109/TASL.2011.2167610 en IEEE transactions on audio, speech, and language processing |
spellingShingle | DRNTU::Engineering::Computer science and engineering Siniscalchi, Sabato Marco. Lyu, Dau-Cheng. Svendsen, Torbjørn. Lee, Chin-Hui. Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data |
title | Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data |
title_full | Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data |
title_fullStr | Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data |
title_full_unstemmed | Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data |
title_short | Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data |
title_sort | experiments on cross language attribute detection and phone recognition with minimal target specific training data |
topic | DRNTU::Engineering::Computer science and engineering |
url | https://hdl.handle.net/10356/102781 http://hdl.handle.net/10220/16448 |
work_keys_str_mv | AT siniscalchisabatomarco experimentsoncrosslanguageattributedetectionandphonerecognitionwithminimaltargetspecifictrainingdata AT lyudaucheng experimentsoncrosslanguageattributedetectionandphonerecognitionwithminimaltargetspecifictrainingdata AT svendsentorbjørn experimentsoncrosslanguageattributedetectionandphonerecognitionwithminimaltargetspecifictrainingdata AT leechinhui experimentsoncrosslanguageattributedetectionandphonerecognitionwithminimaltargetspecifictrainingdata |