Multi‐dimensional long short‐term memory networks for artificial Arabic text recognition in news video

This study presents a novel approach for Arabic video text recognition based on recurrent neural networks. In fact, embedded texts in videos represent a rich source of information for indexing and automatically annotating multimedia documents. However, video text recognition is a non‐trivial task du...

Full description

Bibliographic Details
Main Authors: Oussama Zayene, Sameh Masmoudi Touj, Jean Hennebert, Rolf Ingold, Najoua Essoukri Ben Amara
Format: Article
Language:English
Published: Wiley 2018-08-01
Series:IET Computer Vision
Subjects:
Online Access:https://doi.org/10.1049/iet-cvi.2017.0468
Description
Summary:This study presents a novel approach for Arabic video text recognition based on recurrent neural networks. In fact, embedded texts in videos represent a rich source of information for indexing and automatically annotating multimedia documents. However, video text recognition is a non‐trivial task due to many challenges like the variability of text patterns and the complexity of backgrounds. In the case of Arabic, the presence of diacritic marks, the cursive nature of the script and the non‐uniform intra/inter word distances, may introduce many additional challenges. The proposed system presents a segmentation‐free method that relies specifically on a multi‐dimensional long short‐term memory coupled with a connectionist temporal classification layer. It is shown that using an efficient pre‐processing step and a compact representation of Arabic character models brings robust performance and yields a low‐error rate than other recently published methods. The authors’ system is trained and evaluated using the public AcTiV‐R dataset under different evaluation protocols. The obtained results are very interesting. They also outperform current state‐of‐the‐art approaches on the public dataset ALIF in terms of recognition rates at both character and line levels.
ISSN:1751-9632
1751-9640