ChatSubs: A dataset of dialogues in Spanish, Catalan, Basque and Galician extracted from movie subtitles for developing advanced conversational models

The ChatSubs dataset [5] contains dialogue data in Spanish and three of Spain's co-official languages (Catalan, Basque, and Galician). It has been obtained from OpenSubtitles, from which we have gathered the movie subtitles in our languages of interest and processed them to generate clearly seg...

Full description

Bibliographic Details
Main Authors: Ksenia Kharitonova, Zoraida Callejas, David Pérez-Fernández, Asier Gutiérrez-Fandiño, David Griol
Format: Article
Language:English
Published: Elsevier 2023-10-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340923006650