Multi‐dimensional long short‐term memory networks for artificial Arabic text recognition in news video

This study presents a novel approach for Arabic video text recognition based on recurrent neural networks. In fact, embedded texts in videos represent a rich source of information for indexing and automatically annotating multimedia documents. However, video text recognition is a non‐trivial task du...

Full description

Bibliographic Details
Main Authors: Oussama Zayene, Sameh Masmoudi Touj, Jean Hennebert, Rolf Ingold, Najoua Essoukri Ben Amara
Format: Article
Language:English
Published: Wiley 2018-08-01
Series:IET Computer Vision
Subjects:
Online Access:https://doi.org/10.1049/iet-cvi.2017.0468
_version_ 1797684822066331648
author Oussama Zayene
Sameh Masmoudi Touj
Jean Hennebert
Rolf Ingold
Najoua Essoukri Ben Amara
author_facet Oussama Zayene
Sameh Masmoudi Touj
Jean Hennebert
Rolf Ingold
Najoua Essoukri Ben Amara
author_sort Oussama Zayene
collection DOAJ
description This study presents a novel approach for Arabic video text recognition based on recurrent neural networks. In fact, embedded texts in videos represent a rich source of information for indexing and automatically annotating multimedia documents. However, video text recognition is a non‐trivial task due to many challenges like the variability of text patterns and the complexity of backgrounds. In the case of Arabic, the presence of diacritic marks, the cursive nature of the script and the non‐uniform intra/inter word distances, may introduce many additional challenges. The proposed system presents a segmentation‐free method that relies specifically on a multi‐dimensional long short‐term memory coupled with a connectionist temporal classification layer. It is shown that using an efficient pre‐processing step and a compact representation of Arabic character models brings robust performance and yields a low‐error rate than other recently published methods. The authors’ system is trained and evaluated using the public AcTiV‐R dataset under different evaluation protocols. The obtained results are very interesting. They also outperform current state‐of‐the‐art approaches on the public dataset ALIF in terms of recognition rates at both character and line levels.
first_indexed 2024-03-12T00:35:17Z
format Article
id doaj.art-9015363f8abf4b7f86c6536de31da16e
institution Directory Open Access Journal
issn 1751-9632
1751-9640
language English
last_indexed 2024-03-12T00:35:17Z
publishDate 2018-08-01
publisher Wiley
record_format Article
series IET Computer Vision
spelling doaj.art-9015363f8abf4b7f86c6536de31da16e2023-09-15T09:48:11ZengWileyIET Computer Vision1751-96321751-96402018-08-0112571071910.1049/iet-cvi.2017.0468Multi‐dimensional long short‐term memory networks for artificial Arabic text recognition in news videoOussama Zayene0Sameh Masmoudi Touj1Jean Hennebert2Rolf Ingold3Najoua Essoukri Ben Amara4LATIS Laboratory, National Engineering School of Sousse, University of SousseTunisiaLATIS Laboratory, National Engineering School of Sousse, University of SousseTunisiaInstitute of Complex Systems, HES‐SO, University of Applied Science Western SwitzerlandSwitzerlandDIVA Group, University of FribourgSwitzerlandLATIS Laboratory, National Engineering School of Sousse, University of SousseTunisiaThis study presents a novel approach for Arabic video text recognition based on recurrent neural networks. In fact, embedded texts in videos represent a rich source of information for indexing and automatically annotating multimedia documents. However, video text recognition is a non‐trivial task due to many challenges like the variability of text patterns and the complexity of backgrounds. In the case of Arabic, the presence of diacritic marks, the cursive nature of the script and the non‐uniform intra/inter word distances, may introduce many additional challenges. The proposed system presents a segmentation‐free method that relies specifically on a multi‐dimensional long short‐term memory coupled with a connectionist temporal classification layer. It is shown that using an efficient pre‐processing step and a compact representation of Arabic character models brings robust performance and yields a low‐error rate than other recently published methods. The authors’ system is trained and evaluated using the public AcTiV‐R dataset under different evaluation protocols. The obtained results are very interesting. They also outperform current state‐of‐the‐art approaches on the public dataset ALIF in terms of recognition rates at both character and line levels.https://doi.org/10.1049/iet-cvi.2017.0468multidimensional long short-term memory networksartificial Arabic video text recognitionnews videorecurrent neural networksembedded textsmultimedia document annotation
spellingShingle Oussama Zayene
Sameh Masmoudi Touj
Jean Hennebert
Rolf Ingold
Najoua Essoukri Ben Amara
Multi‐dimensional long short‐term memory networks for artificial Arabic text recognition in news video
IET Computer Vision
multidimensional long short-term memory networks
artificial Arabic video text recognition
news video
recurrent neural networks
embedded texts
multimedia document annotation
title Multi‐dimensional long short‐term memory networks for artificial Arabic text recognition in news video
title_full Multi‐dimensional long short‐term memory networks for artificial Arabic text recognition in news video
title_fullStr Multi‐dimensional long short‐term memory networks for artificial Arabic text recognition in news video
title_full_unstemmed Multi‐dimensional long short‐term memory networks for artificial Arabic text recognition in news video
title_short Multi‐dimensional long short‐term memory networks for artificial Arabic text recognition in news video
title_sort multi dimensional long short term memory networks for artificial arabic text recognition in news video
topic multidimensional long short-term memory networks
artificial Arabic video text recognition
news video
recurrent neural networks
embedded texts
multimedia document annotation
url https://doi.org/10.1049/iet-cvi.2017.0468
work_keys_str_mv AT oussamazayene multidimensionallongshorttermmemorynetworksforartificialarabictextrecognitioninnewsvideo
AT samehmasmouditouj multidimensionallongshorttermmemorynetworksforartificialarabictextrecognitioninnewsvideo
AT jeanhennebert multidimensionallongshorttermmemorynetworksforartificialarabictextrecognitioninnewsvideo
AT rolfingold multidimensionallongshorttermmemorynetworksforartificialarabictextrecognitioninnewsvideo
AT najouaessoukribenamara multidimensionallongshorttermmemorynetworksforartificialarabictextrecognitioninnewsvideo