Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization

Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings. However, orthogonal mapping only works on language pairs whose embeddings are naturally isomorphic. For non-isomorphic...

Full description

Bibliographic Details
Main Authors: Zhang, Mozhi, Xu, Keyulu, Kawarabayashi, Ken-ichi, Jegelka, Stefanie Sabrina, Boyd-Graber, Jordan
Other Authors: Massachusetts Institute of Technology. Department of Linguistics and Philosophy
Format: Article
Language:English
Published: Association for Computational Linguistics 2020
Online Access:https://hdl.handle.net/1721.1/128914
Description
Summary:Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings. However, orthogonal mapping only works on language pairs whose embeddings are naturally isomorphic. For non-isomorphic pairs, our method (Iterative Normalization) transforms monolingual embeddings to make orthogonal alignment easier by simultaneously enforcing that (1) individual word vectors are unit length, and (2) each language's average vector is zero. Iterative Normalization consistently improves word translation accuracy of three CLWE methods, with the largest improvement observed on English-Japanese (from 2% to 44% test accuracy).