ArzEn-MultiGenre: An aligned parallel dataset of Egyptian Arabic song lyrics, novels, and subtitles, with English translations
ArzEn-MultiGenre is a parallel dataset of Egyptian Arabic song lyrics, novels, and TV show subtitles that are manually translated and aligned with their English counterparts. The dataset contains 25,557 segment pairs that can be used to benchmark new machine translation models, fine-tune large langu...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2024-06-01
|
Series: | Data in Brief |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2352340924002403 |