A Statistical Model for Lost Language Decipherment

URL to paper listed on conference site

Bibliographic Details
Main Authors: Snyder, Benjamin, Barzilay, Regina, Knight, Kevin
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:en_US
Published: Association for Computational Linguistics 2011
Online Access:http://hdl.handle.net/1721.1/62802
https://orcid.org/0000-0002-2921-8201
_version_ 1811085244013477888
author Snyder, Benjamin
Barzilay, Regina
Knight, Kevin
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Snyder, Benjamin
Barzilay, Regina
Knight, Kevin
author_sort Snyder, Benjamin
collection MIT
description URL to paper listed on conference site
first_indexed 2024-09-23T13:05:52Z
format Article
id mit-1721.1/62802
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T13:05:52Z
publishDate 2011
publisher Association for Computational Linguistics
record_format dspace
spelling mit-1721.1/628022022-10-01T13:00:05Z A Statistical Model for Lost Language Decipherment Snyder, Benjamin Barzilay, Regina Knight, Kevin Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Barzilay, Regina Snyder, Benjamin Barzilay, Regina URL to paper listed on conference site In this paper we propose a method for the automatic decipherment of lost langauges. Given a non-parallel corpus in a known related language, our model produces both alphabetic mappings and translations of words into their corresponding cognates. We employ a non-parametric Bayesian framework to simultaneously capture both low-level character mappings and high-level morphemic correspondences. This formulation enables us to encode some of the linguistic intuitions that have guided human decipherers. When applied to the ancient Semitic language Ugaritic, the model correctly maps nearly all letters to their Hebrew counterparts, and deduces the correct Hebrew cognate for over half of the Ugaritic words which have cognates in Hebrew. National Science Foundation (U.S.) (CAREER grant IIS-0448168) National Science Foundation (U.S.) (Career award IIS 0835445) 2011-05-10T17:57:45Z 2011-05-10T17:57:45Z 2010-07 Article http://purl.org/eprint/type/ConferencePaper http://hdl.handle.net/1721.1/62802 Snyder, Benjamin, Regina Barzilay and Kevin Knight. "A Statistical Model for Lost Language Decipherment." in ACL 2010, 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, July 11–16, 2010. https://orcid.org/0000-0002-2921-8201 en_US http://acl2010.org/program_mainconf.html#s86 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010 Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/ application/pdf Association for Computational Linguistics MIT web domain
spellingShingle Snyder, Benjamin
Barzilay, Regina
Knight, Kevin
A Statistical Model for Lost Language Decipherment
title A Statistical Model for Lost Language Decipherment
title_full A Statistical Model for Lost Language Decipherment
title_fullStr A Statistical Model for Lost Language Decipherment
title_full_unstemmed A Statistical Model for Lost Language Decipherment
title_short A Statistical Model for Lost Language Decipherment
title_sort statistical model for lost language decipherment
url http://hdl.handle.net/1721.1/62802
https://orcid.org/0000-0002-2921-8201
work_keys_str_mv AT snyderbenjamin astatisticalmodelforlostlanguagedecipherment
AT barzilayregina astatisticalmodelforlostlanguagedecipherment
AT knightkevin astatisticalmodelforlostlanguagedecipherment
AT snyderbenjamin statisticalmodelforlostlanguagedecipherment
AT barzilayregina statisticalmodelforlostlanguagedecipherment
AT knightkevin statisticalmodelforlostlanguagedecipherment