Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization
Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings. However, orthogonal mapping only works on language pairs whose embeddings are naturally isomorphic. For non-isomorphic...
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
Association for Computational Linguistics
2020
|
Online Access: | https://hdl.handle.net/1721.1/128914 |
_version_ | 1826197431220436992 |
---|---|
author | Zhang, Mozhi Xu, Keyulu Kawarabayashi, Ken-ichi Jegelka, Stefanie Sabrina Boyd-Graber, Jordan |
author2 | Massachusetts Institute of Technology. Department of Linguistics and Philosophy |
author_facet | Massachusetts Institute of Technology. Department of Linguistics and Philosophy Zhang, Mozhi Xu, Keyulu Kawarabayashi, Ken-ichi Jegelka, Stefanie Sabrina Boyd-Graber, Jordan |
author_sort | Zhang, Mozhi |
collection | MIT |
description | Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings. However, orthogonal mapping only works on language pairs whose embeddings are naturally isomorphic. For non-isomorphic pairs, our method (Iterative Normalization) transforms monolingual embeddings to make orthogonal alignment easier by simultaneously enforcing that (1) individual word vectors are unit length, and (2) each language's average vector is zero. Iterative Normalization consistently improves word translation accuracy of three CLWE methods, with the largest improvement observed on English-Japanese (from 2% to 44% test accuracy). |
first_indexed | 2024-09-23T10:47:33Z |
format | Article |
id | mit-1721.1/128914 |
institution | Massachusetts Institute of Technology |
language | English |
last_indexed | 2024-09-23T10:47:33Z |
publishDate | 2020 |
publisher | Association for Computational Linguistics |
record_format | dspace |
spelling | mit-1721.1/1289142022-09-30T23:04:20Z Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization Zhang, Mozhi Xu, Keyulu Kawarabayashi, Ken-ichi Jegelka, Stefanie Sabrina Boyd-Graber, Jordan Massachusetts Institute of Technology. Department of Linguistics and Philosophy Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings. However, orthogonal mapping only works on language pairs whose embeddings are naturally isomorphic. For non-isomorphic pairs, our method (Iterative Normalization) transforms monolingual embeddings to make orthogonal alignment easier by simultaneously enforcing that (1) individual word vectors are unit length, and (2) each language's average vector is zero. Iterative Normalization consistently improves word translation accuracy of three CLWE methods, with the largest improvement observed on English-Japanese (from 2% to 44% test accuracy). 2020-12-23T19:19:36Z 2020-12-23T19:19:36Z 2019-07 2020-12-21T19:35:19Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/128914 Zhang, Mozhi et al. "Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization." 57th Annual Meeting of the Association for Computational Linguistics, July 2019, Florence, Italy, Association for Computational Linguistics, July 2019. © 2019 Association for Computational Linguistics en http://dx.doi.org/10.18653/v1/p19-1307 57th Annual Meeting of the Association for Computational Linguistics Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/ application/pdf Association for Computational Linguistics Association for Computational Linguistics |
spellingShingle | Zhang, Mozhi Xu, Keyulu Kawarabayashi, Ken-ichi Jegelka, Stefanie Sabrina Boyd-Graber, Jordan Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization |
title | Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization |
title_full | Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization |
title_fullStr | Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization |
title_full_unstemmed | Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization |
title_short | Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization |
title_sort | are girls neko or shojo cross lingual alignment of non isomorphic embeddings with iterative normalization |
url | https://hdl.handle.net/1721.1/128914 |
work_keys_str_mv | AT zhangmozhi aregirlsnekoorshojocrosslingualalignmentofnonisomorphicembeddingswithiterativenormalization AT xukeyulu aregirlsnekoorshojocrosslingualalignmentofnonisomorphicembeddingswithiterativenormalization AT kawarabayashikenichi aregirlsnekoorshojocrosslingualalignmentofnonisomorphicembeddingswithiterativenormalization AT jegelkastefaniesabrina aregirlsnekoorshojocrosslingualalignmentofnonisomorphicembeddingswithiterativenormalization AT boydgraberjordan aregirlsnekoorshojocrosslingualalignmentofnonisomorphicembeddingswithiterativenormalization |