The Taxonomy of Writing Systems: How to Measure How Logographic a System Is

Taxonomies of writing systems since Gelb (1952) have classified systems based on what the written symbols represent: if they represent words or morphemes, they are logographic; if syllables, syllabic; if segments, alphabetic; and so forth. Sproat (2000) and Rogers (2005) broke with t...

Full description

Bibliographic Details
Main Authors: Richard Sproat, Alexander Gutkin
Format: Article
Language:English
Published: The MIT Press 2021-11-01
Series:Computational Linguistics
Online Access:https://direct.mit.edu/coli/article/47/3/477/102776/The-Taxonomy-of-Writing-Systems-How-to-Measure-How
_version_ 1828340661672214528
author Richard Sproat
Alexander Gutkin
author_facet Richard Sproat
Alexander Gutkin
author_sort Richard Sproat
collection DOAJ
description Taxonomies of writing systems since Gelb (1952) have classified systems based on what the written symbols represent: if they represent words or morphemes, they are logographic; if syllables, syllabic; if segments, alphabetic; and so forth. Sproat (2000) and Rogers (2005) broke with tradition by splitting the logographic and phonographic aspects into two dimensions, with logography being graded rather than a categorical distinction. A system could be syllabic, and highly logographic; or alphabetic, and mostly non-logographic. This accords better with how writing systems actually work, but neither author proposed a method for measuring logography.In this article we propose a novel measure of the degree of logography that uses an attention-based sequence-to-sequence model trained to predict the spelling of a token from its pronunciation in context. In an ideal phonographic system, the model should need to attend to only the current token in order to compute how to spell it, and this would show in the attention matrix activations. In contrast, with a logographic system, where a given pronunciation might correspond to several different spellings, the model would need to attend to a broader context. The ratio of the activation outside the token and the total activation forms the basis of our measure. We compare this with a simple lexical measure, and an entropic measure, as well as several other neural models, and argue that on balance our attention-based measure accords best with intuition about how logographic various systems are.Our work provides the first quantifiable measure of the notion of logography that accords with linguistic intuition and, we argue, provides better insight into what this notion means.
first_indexed 2024-04-13T23:00:35Z
format Article
id doaj.art-3fb15beaa2b547cfa713c61573293dce
institution Directory Open Access Journal
issn 0891-2017
1530-9312
language English
last_indexed 2024-04-13T23:00:35Z
publishDate 2021-11-01
publisher The MIT Press
record_format Article
series Computational Linguistics
spelling doaj.art-3fb15beaa2b547cfa713c61573293dce2022-12-22T02:25:52ZengThe MIT PressComputational Linguistics0891-20171530-93122021-11-0147347752810.1162/coli_a_00409The Taxonomy of Writing Systems: How to Measure How Logographic a System IsRichard Sproat0Alexander Gutkin1Search Google, Japan. rws@google.comResearch & Machine Intelligence, Google, UK. agutkin@google.com Taxonomies of writing systems since Gelb (1952) have classified systems based on what the written symbols represent: if they represent words or morphemes, they are logographic; if syllables, syllabic; if segments, alphabetic; and so forth. Sproat (2000) and Rogers (2005) broke with tradition by splitting the logographic and phonographic aspects into two dimensions, with logography being graded rather than a categorical distinction. A system could be syllabic, and highly logographic; or alphabetic, and mostly non-logographic. This accords better with how writing systems actually work, but neither author proposed a method for measuring logography.In this article we propose a novel measure of the degree of logography that uses an attention-based sequence-to-sequence model trained to predict the spelling of a token from its pronunciation in context. In an ideal phonographic system, the model should need to attend to only the current token in order to compute how to spell it, and this would show in the attention matrix activations. In contrast, with a logographic system, where a given pronunciation might correspond to several different spellings, the model would need to attend to a broader context. The ratio of the activation outside the token and the total activation forms the basis of our measure. We compare this with a simple lexical measure, and an entropic measure, as well as several other neural models, and argue that on balance our attention-based measure accords best with intuition about how logographic various systems are.Our work provides the first quantifiable measure of the notion of logography that accords with linguistic intuition and, we argue, provides better insight into what this notion means.https://direct.mit.edu/coli/article/47/3/477/102776/The-Taxonomy-of-Writing-Systems-How-to-Measure-How
spellingShingle Richard Sproat
Alexander Gutkin
The Taxonomy of Writing Systems: How to Measure How Logographic a System Is
Computational Linguistics
title The Taxonomy of Writing Systems: How to Measure How Logographic a System Is
title_full The Taxonomy of Writing Systems: How to Measure How Logographic a System Is
title_fullStr The Taxonomy of Writing Systems: How to Measure How Logographic a System Is
title_full_unstemmed The Taxonomy of Writing Systems: How to Measure How Logographic a System Is
title_short The Taxonomy of Writing Systems: How to Measure How Logographic a System Is
title_sort taxonomy of writing systems how to measure how logographic a system is
url https://direct.mit.edu/coli/article/47/3/477/102776/The-Taxonomy-of-Writing-Systems-How-to-Measure-How
work_keys_str_mv AT richardsproat thetaxonomyofwritingsystemshowtomeasurehowlogographicasystemis
AT alexandergutkin thetaxonomyofwritingsystemshowtomeasurehowlogographicasystemis
AT richardsproat taxonomyofwritingsystemshowtomeasurehowlogographicasystemis
AT alexandergutkin taxonomyofwritingsystemshowtomeasurehowlogographicasystemis