Automatic recognition and representation of text in the form of audio stream

The problem of automatic speech generation from a text file is considered. An analytical review of the software has been completed. They are designed to recognize texts and convert them to an audio stream. The advantages and disadvantages of software products are estimated. Based on this, a conclusi...

Full description

Bibliographic Details
Main Authors:	L. V. Serebryanaya, I. E. Lasy
Format:	Article
Language:	Russian
Published:	Educational institution «Belarusian State University of Informatics and Radioelectronics» 2021-10-01
Series:	Doklady Belorusskogo gosudarstvennogo universiteta informatiki i radioèlektroniki
Subjects:	artificial neural network model audio stream encoder and decoder speech generation spectrogram
Online Access:	https://doklady.bsuir.by/jour/article/view/3158

_version_	1797880964017291264
author	L. V. Serebryanaya I. E. Lasy
author_facet	L. V. Serebryanaya I. E. Lasy
author_sort	L. V. Serebryanaya
collection	DOAJ
description	The problem of automatic speech generation from a text file is considered. An analytical review of the software has been completed. They are designed to recognize texts and convert them to an audio stream. The advantages and disadvantages of software products are estimated. Based on this, a conclusion was drawn about the relevance of developing a software for automatic generation of an audio stream from a text in Russian. Models based on artificial neural networks, which are used for speech synthesis, are analyzed. After that, a mathematical model of the created software is built. It consists of three components: a convolutional encoder, a convolutional decoder, and a transformer. The architecture of the software is designed. It includes a graphical interface, an application server, and a speech synthesis system. A number of algorithms have been developed: preprocessing text before loading it into a software, converting audio files of a training sample and training a network, generating speech based on arbitrary text files. A software has been created, which is a single-page application and has a web interface for interacting with the user. To assess the quality of the software, a metric was used that represents the average score of different opinions. As a result of the aggregation of different opinions, the metric received a sufficiently high value, on the basis of which it can be assumed that all the tasks have been solved.
first_indexed	2024-04-10T03:11:32Z
format	Article
id	doaj.art-bb9ecad1eac84323b53326c0381e5feb
institution	Directory Open Access Journal
issn	1729-7648
language	Russian
last_indexed	2024-04-10T03:11:32Z
publishDate	2021-10-01
publisher	Educational institution «Belarusian State University of Informatics and Radioelectronics»
record_format	Article
series	Doklady Belorusskogo gosudarstvennogo universiteta informatiki i radioèlektroniki
spelling	doaj.art-bb9ecad1eac84323b53326c0381e5feb2023-03-13T07:33:22ZrusEducational institution «Belarusian State University of Informatics and Radioelectronics»Doklady Belorusskogo gosudarstvennogo universiteta informatiki i radioèlektroniki1729-76482021-10-01196515810.35596/1729-7648-2021-19-6-51-581732Automatic recognition and representation of text in the form of audio streamL. V. Serebryanaya0I. E. Lasy1Belarusian State University of Informatics and RadioelectronicsBelarusian State University of Informatics and RadioelectronicsThe problem of automatic speech generation from a text file is considered. An analytical review of the software has been completed. They are designed to recognize texts and convert them to an audio stream. The advantages and disadvantages of software products are estimated. Based on this, a conclusion was drawn about the relevance of developing a software for automatic generation of an audio stream from a text in Russian. Models based on artificial neural networks, which are used for speech synthesis, are analyzed. After that, a mathematical model of the created software is built. It consists of three components: a convolutional encoder, a convolutional decoder, and a transformer. The architecture of the software is designed. It includes a graphical interface, an application server, and a speech synthesis system. A number of algorithms have been developed: preprocessing text before loading it into a software, converting audio files of a training sample and training a network, generating speech based on arbitrary text files. A software has been created, which is a single-page application and has a web interface for interacting with the user. To assess the quality of the software, a metric was used that represents the average score of different opinions. As a result of the aggregation of different opinions, the metric received a sufficiently high value, on the basis of which it can be assumed that all the tasks have been solved.https://doklady.bsuir.by/jour/article/view/3158artificial neural network modelaudio streamencoder and decoderspeech generationspectrogram
spellingShingle	L. V. Serebryanaya I. E. Lasy Automatic recognition and representation of text in the form of audio stream Doklady Belorusskogo gosudarstvennogo universiteta informatiki i radioèlektroniki artificial neural network model audio stream encoder and decoder speech generation spectrogram
title	Automatic recognition and representation of text in the form of audio stream
title_full	Automatic recognition and representation of text in the form of audio stream
title_fullStr	Automatic recognition and representation of text in the form of audio stream
title_full_unstemmed	Automatic recognition and representation of text in the form of audio stream
title_short	Automatic recognition and representation of text in the form of audio stream
title_sort	automatic recognition and representation of text in the form of audio stream
topic	artificial neural network model audio stream encoder and decoder speech generation spectrogram
url	https://doklady.bsuir.by/jour/article/view/3158
work_keys_str_mv	AT lvserebryanaya automaticrecognitionandrepresentationoftextintheformofaudiostream AT ielasy automaticrecognitionandrepresentationoftextintheformofaudiostream

Automatic recognition and representation of text in the form of audio stream

Similar Items