Automatic recognition and representation of text in the form of audio stream

The problem of automatic speech generation from a text file is considered. An analytical review of the software has been completed. They are designed to recognize texts and convert them to an audio stream. The advantages and disadvantages of software products are estimated. Based on this, a conclusi...

Full description

Bibliographic Details
Main Authors: L. V. Serebryanaya, I. E. Lasy
Format: Article
Language:Russian
Published: Educational institution «Belarusian State University of Informatics and Radioelectronics» 2021-10-01
Series:Doklady Belorusskogo gosudarstvennogo universiteta informatiki i radioèlektroniki
Subjects:
Online Access:https://doklady.bsuir.by/jour/article/view/3158
_version_ 1797880964017291264
author L. V. Serebryanaya
I. E. Lasy
author_facet L. V. Serebryanaya
I. E. Lasy
author_sort L. V. Serebryanaya
collection DOAJ
description The problem of automatic speech generation from a text file is considered. An analytical review of the software has been completed. They are designed to recognize texts and convert them to an audio stream. The advantages and disadvantages of software products are estimated. Based on this, a conclusion was drawn about the relevance of developing a software for automatic generation of an audio stream from a text in Russian. Models based on artificial neural networks, which are used for speech synthesis, are analyzed. After that, a mathematical model of the created software is built. It consists of three components: a convolutional encoder, a convolutional decoder, and a transformer. The architecture of the software is designed. It includes a graphical interface, an application server, and a speech synthesis system. A number of algorithms have been developed: preprocessing text before loading it into a software, converting audio files of a training sample and training a network, generating speech based on arbitrary text files. A software has been created, which is a single-page application and has a web interface for interacting with the user. To assess the quality of the software, a metric was used that represents the average score of different opinions. As a result of the aggregation of different opinions, the metric received a sufficiently high value, on the basis of which it can be assumed that all the tasks have been solved.
first_indexed 2024-04-10T03:11:32Z
format Article
id doaj.art-bb9ecad1eac84323b53326c0381e5feb
institution Directory Open Access Journal
issn 1729-7648
language Russian
last_indexed 2024-04-10T03:11:32Z
publishDate 2021-10-01
publisher Educational institution «Belarusian State University of Informatics and Radioelectronics»
record_format Article
series Doklady Belorusskogo gosudarstvennogo universiteta informatiki i radioèlektroniki
spelling doaj.art-bb9ecad1eac84323b53326c0381e5feb2023-03-13T07:33:22ZrusEducational institution «Belarusian State University of Informatics and Radioelectronics»Doklady Belorusskogo gosudarstvennogo universiteta informatiki i radioèlektroniki1729-76482021-10-01196515810.35596/1729-7648-2021-19-6-51-581732Automatic recognition and representation of text in the form of audio streamL. V. Serebryanaya0I. E. Lasy1Belarusian State University of Informatics and RadioelectronicsBelarusian State University of Informatics and RadioelectronicsThe problem of automatic speech generation from a text file is considered. An analytical review of the software has been completed. They are designed to recognize texts and convert them to an audio stream. The advantages and disadvantages of software products are estimated. Based on this, a conclusion was drawn about the relevance of developing a software for automatic generation of an audio stream from a text in Russian. Models based on artificial neural networks, which are used for speech synthesis, are analyzed. After that, a mathematical model of the created software is built. It consists of three components: a convolutional encoder, a convolutional decoder, and a transformer. The architecture of the software is designed. It includes a graphical interface, an application server, and a speech synthesis system. A number of algorithms have been developed: preprocessing text before loading it into a software, converting audio files of a training sample and training a network, generating speech based on arbitrary text files. A software has been created, which is a single-page application and has a web interface for interacting with the user. To assess the quality of the software, a metric was used that represents the average score of different opinions. As a result of the aggregation of different opinions, the metric received a sufficiently high value, on the basis of which it can be assumed that all the tasks have been solved.https://doklady.bsuir.by/jour/article/view/3158artificial neural network modelaudio streamencoder and decoderspeech generationspectrogram
spellingShingle L. V. Serebryanaya
I. E. Lasy
Automatic recognition and representation of text in the form of audio stream
Doklady Belorusskogo gosudarstvennogo universiteta informatiki i radioèlektroniki
artificial neural network model
audio stream
encoder and decoder
speech generation
spectrogram
title Automatic recognition and representation of text in the form of audio stream
title_full Automatic recognition and representation of text in the form of audio stream
title_fullStr Automatic recognition and representation of text in the form of audio stream
title_full_unstemmed Automatic recognition and representation of text in the form of audio stream
title_short Automatic recognition and representation of text in the form of audio stream
title_sort automatic recognition and representation of text in the form of audio stream
topic artificial neural network model
audio stream
encoder and decoder
speech generation
spectrogram
url https://doklady.bsuir.by/jour/article/view/3158
work_keys_str_mv AT lvserebryanaya automaticrecognitionandrepresentationoftextintheformofaudiostream
AT ielasy automaticrecognitionandrepresentationoftextintheformofaudiostream