Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization

Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings. However, orthogonal mapping only works on language pairs whose embeddings are naturally isomorphic. For non-isomorphic...

Full description

Bibliographic Details
Main Authors: Zhang, Mozhi, Xu, Keyulu, Kawarabayashi, Ken-ichi, Jegelka, Stefanie Sabrina, Boyd-Graber, Jordan
Other Authors: Massachusetts Institute of Technology. Department of Linguistics and Philosophy
Format: Article
Language:English
Published: Association for Computational Linguistics 2020
Online Access:https://hdl.handle.net/1721.1/128914
_version_ 1826197431220436992
author Zhang, Mozhi
Xu, Keyulu
Kawarabayashi, Ken-ichi
Jegelka, Stefanie Sabrina
Boyd-Graber, Jordan
author2 Massachusetts Institute of Technology. Department of Linguistics and Philosophy
author_facet Massachusetts Institute of Technology. Department of Linguistics and Philosophy
Zhang, Mozhi
Xu, Keyulu
Kawarabayashi, Ken-ichi
Jegelka, Stefanie Sabrina
Boyd-Graber, Jordan
author_sort Zhang, Mozhi
collection MIT
description Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings. However, orthogonal mapping only works on language pairs whose embeddings are naturally isomorphic. For non-isomorphic pairs, our method (Iterative Normalization) transforms monolingual embeddings to make orthogonal alignment easier by simultaneously enforcing that (1) individual word vectors are unit length, and (2) each language's average vector is zero. Iterative Normalization consistently improves word translation accuracy of three CLWE methods, with the largest improvement observed on English-Japanese (from 2% to 44% test accuracy).
first_indexed 2024-09-23T10:47:33Z
format Article
id mit-1721.1/128914
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T10:47:33Z
publishDate 2020
publisher Association for Computational Linguistics
record_format dspace
spelling mit-1721.1/1289142022-09-30T23:04:20Z Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization Zhang, Mozhi Xu, Keyulu Kawarabayashi, Ken-ichi Jegelka, Stefanie Sabrina Boyd-Graber, Jordan Massachusetts Institute of Technology. Department of Linguistics and Philosophy Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings. However, orthogonal mapping only works on language pairs whose embeddings are naturally isomorphic. For non-isomorphic pairs, our method (Iterative Normalization) transforms monolingual embeddings to make orthogonal alignment easier by simultaneously enforcing that (1) individual word vectors are unit length, and (2) each language's average vector is zero. Iterative Normalization consistently improves word translation accuracy of three CLWE methods, with the largest improvement observed on English-Japanese (from 2% to 44% test accuracy). 2020-12-23T19:19:36Z 2020-12-23T19:19:36Z 2019-07 2020-12-21T19:35:19Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/128914 Zhang, Mozhi et al. "Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization." 57th Annual Meeting of the Association for Computational Linguistics, July 2019, Florence, Italy, Association for Computational Linguistics, July 2019. © 2019 Association for Computational Linguistics en http://dx.doi.org/10.18653/v1/p19-1307 57th Annual Meeting of the Association for Computational Linguistics Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/ application/pdf Association for Computational Linguistics Association for Computational Linguistics
spellingShingle Zhang, Mozhi
Xu, Keyulu
Kawarabayashi, Ken-ichi
Jegelka, Stefanie Sabrina
Boyd-Graber, Jordan
Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization
title Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization
title_full Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization
title_fullStr Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization
title_full_unstemmed Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization
title_short Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization
title_sort are girls neko or shojo cross lingual alignment of non isomorphic embeddings with iterative normalization
url https://hdl.handle.net/1721.1/128914
work_keys_str_mv AT zhangmozhi aregirlsnekoorshojocrosslingualalignmentofnonisomorphicembeddingswithiterativenormalization
AT xukeyulu aregirlsnekoorshojocrosslingualalignmentofnonisomorphicembeddingswithiterativenormalization
AT kawarabayashikenichi aregirlsnekoorshojocrosslingualalignmentofnonisomorphicembeddingswithiterativenormalization
AT jegelkastefaniesabrina aregirlsnekoorshojocrosslingualalignmentofnonisomorphicembeddingswithiterativenormalization
AT boydgraberjordan aregirlsnekoorshojocrosslingualalignmentofnonisomorphicembeddingswithiterativenormalization