Application of Fischer semi discriminant analysis for speaker diarization in costa rican radio broadcasts

Automatic segmentation and classification of audio streams is a challenging problem, with many applications, such as indexing multi – media digital libraries, information retrieving, and the building of speech corpus or spoken corpus) for particular languages and accents. Those corpus is a database...

Full description

Bibliographic Details
Main Authors:	Roberto Sánchez Cárdenas, Marvin Coto-Jiménez
Format:	Article
Language:	Spanish
Published:	Instituto Tecnológico de Costa Rica 2022-11-01
Series:	Tecnología en Marcha
Subjects:	Broadcasting clustering speaker diarization speech technologies
Online Access:	https://172.20.14.50/index.php/tec_marcha/article/view/6464

_version_	1797652966359957504
author	Roberto Sánchez Cárdenas Marvin Coto-Jiménez
author_facet	Roberto Sánchez Cárdenas Marvin Coto-Jiménez
author_sort	Roberto Sánchez Cárdenas
collection	DOAJ
description	Automatic segmentation and classification of audio streams is a challenging problem, with many applications, such as indexing multi – media digital libraries, information retrieving, and the building of speech corpus or spoken corpus) for particular languages and accents. Those corpus is a database of speech audio files and the corresponding text transcriptions. Among the several steps and tasks required for any of those applications, the speaker diarization is one of the most relevant, because it pretends to find boundaries in the audio recordings according to who speaks in each fragment. Speaker diarization can be performed in a supervised or unsupervised way and is commonly applied in audios consisting of pure speech. In this work, a first annotated dataset and analysis of speaker diarization for Costa Rican radio broadcasting is performed, using two approaches: a classic one based on k-means clustering, and the more recent Fischer Semi Discriminant. We chose publicly available radio broadcast and decided to compare those systems’ applicability in the complete audio files, which also contains some segments of music and challenging acoustic conditions. Results show a dependency on the results according to the number of speakers in each broadcast, especially in the average cluster purity. The results also show the necessity of further exploration and combining with other classification and segmentation algorithms to better extract useful information from the dataset and allow further development of speech corpus.
first_indexed	2024-03-11T16:37:12Z
format	Article
id	doaj.art-89c9889597274b9cb720cbf2b89a4a4b
institution	Directory Open Access Journal
issn	0379-3982 2215-3241
language	Spanish
last_indexed	2024-03-11T16:37:12Z
publishDate	2022-11-01
publisher	Instituto Tecnológico de Costa Rica
record_format	Article
series	Tecnología en Marcha
spelling	doaj.art-89c9889597274b9cb720cbf2b89a4a4b2023-10-23T14:27:31ZspaInstituto Tecnológico de Costa RicaTecnología en Marcha0379-39822215-32412022-11-0135810.18845/tm.v35i8.6464Application of Fischer semi discriminant analysis for speaker diarization in costa rican radio broadcastsRoberto Sánchez CárdenasMarvin Coto-Jiménez Automatic segmentation and classification of audio streams is a challenging problem, with many applications, such as indexing multi – media digital libraries, information retrieving, and the building of speech corpus or spoken corpus) for particular languages and accents. Those corpus is a database of speech audio files and the corresponding text transcriptions. Among the several steps and tasks required for any of those applications, the speaker diarization is one of the most relevant, because it pretends to find boundaries in the audio recordings according to who speaks in each fragment. Speaker diarization can be performed in a supervised or unsupervised way and is commonly applied in audios consisting of pure speech. In this work, a first annotated dataset and analysis of speaker diarization for Costa Rican radio broadcasting is performed, using two approaches: a classic one based on k-means clustering, and the more recent Fischer Semi Discriminant. We chose publicly available radio broadcast and decided to compare those systems’ applicability in the complete audio files, which also contains some segments of music and challenging acoustic conditions. Results show a dependency on the results according to the number of speakers in each broadcast, especially in the average cluster purity. The results also show the necessity of further exploration and combining with other classification and segmentation algorithms to better extract useful information from the dataset and allow further development of speech corpus. https://172.20.14.50/index.php/tec_marcha/article/view/6464Broadcastingclusteringspeaker diarizationspeech technologies
spellingShingle	Roberto Sánchez Cárdenas Marvin Coto-Jiménez Application of Fischer semi discriminant analysis for speaker diarization in costa rican radio broadcasts Tecnología en Marcha Broadcasting clustering speaker diarization speech technologies
title	Application of Fischer semi discriminant analysis for speaker diarization in costa rican radio broadcasts
title_full	Application of Fischer semi discriminant analysis for speaker diarization in costa rican radio broadcasts
title_fullStr	Application of Fischer semi discriminant analysis for speaker diarization in costa rican radio broadcasts
title_full_unstemmed	Application of Fischer semi discriminant analysis for speaker diarization in costa rican radio broadcasts
title_short	Application of Fischer semi discriminant analysis for speaker diarization in costa rican radio broadcasts
title_sort	application of fischer semi discriminant analysis for speaker diarization in costa rican radio broadcasts
topic	Broadcasting clustering speaker diarization speech technologies
url	https://172.20.14.50/index.php/tec_marcha/article/view/6464
work_keys_str_mv	AT robertosanchezcardenas applicationoffischersemidiscriminantanalysisforspeakerdiarizationincostaricanradiobroadcasts AT marvincotojimenez applicationoffischersemidiscriminantanalysisforspeakerdiarizationincostaricanradiobroadcasts

Application of Fischer semi discriminant analysis for speaker diarization in costa rican radio broadcasts

Similar Items