Language model tokenizers introduce unfairness between languages

Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the t...

ver descrição completa

Detalhes bibliográficos
Main Authors: Petrov, A, Malfa, EL, Torr, P, Bibi, A
Formato: Conference item
Idioma:English
Publicado em: Curran Associates 2024