Language model tokenizers introduce unfairness between languages
Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the t...
Main Authors: | , , , |
---|---|
Formato: | Conference item |
Idioma: | English |
Publicado em: |
Curran Associates
2024
|