ArzEn-MultiGenre: An aligned parallel dataset of Egyptian Arabic song lyrics, novels, and subtitles, with English translations
ArzEn-MultiGenre is a parallel dataset of Egyptian Arabic song lyrics, novels, and TV show subtitles that are manually translated and aligned with their English counterparts. The dataset contains 25,557 segment pairs that can be used to benchmark new machine translation models, fine-tune large langu...
Main Author: | Rania Al-Sabbagh |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2024-06-01
|
Series: | Data in Brief |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2352340924002403 |
Similar Items
-
An Open-Source Library of Phasor Measurement Unit Data Capturing Real Bulk Power Systems Behavior
by: Shuchismita Biswas, et al.
Published: (2023-01-01) -
Arsitektur Sistem Percakapan Otomatis Berbahasa Indonesia dengan Normalisasi Bahasa Informal Menjadi Baku
by: Muhammad Fathur Rahman Khairul, et al.
Published: (2023-12-01) -
Slovak Dataset for Multilingual Question Answering
by: Daniel Hladek, et al.
Published: (2023-01-01) -
Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools
by: Sandro De Paula Mendonca, et al.
Published: (2020-01-01) -
AUDIOVISUAL AND SONG TRANSLATION OF INDONESIAN SUBTITLE IN SHELTER MUSIC VIDEO
by: Mohamad Irham Poluwa, et al.
Published: (2021-12-01)