Morphological Verb-Aware Tibetan Language Model

The Tibetan language model (TLM) is the key to Tibetan natural language processing. In this paper, we first observe that, different from widely used languages, Tibetan contains many morphological verbs that rarely appear in natural sentences but play a key role in accurate text prediction. This prop...

Full description

Bibliographic Details
Main Authors: Kuntharrgyal Khysru, Di Jin, Jianwu Dang
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8723332/
_version_ 1819133549460062208
author Kuntharrgyal Khysru
Di Jin
Jianwu Dang
author_facet Kuntharrgyal Khysru
Di Jin
Jianwu Dang
author_sort Kuntharrgyal Khysru
collection DOAJ
description The Tibetan language model (TLM) is the key to Tibetan natural language processing. In this paper, we first observe that, different from widely used languages, Tibetan contains many morphological verbs that rarely appear in natural sentences but play a key role in accurate text prediction. This property is usually ignored by existing methods and makes traditional training strategies less effective in constructing accurate and robust TLMs. Hence, we propose a morphological verb-aware TLM by offline learning via a character frequency reweighting strategy and online tuning of discriminative weights conditioned on morphological verbs. However, because of the influence of morphological verbs on the tense and semantics of sentences, it is necessary to consider the morphological verbs in Tibetan. As a result, compared with state-of-the-art methods, our method not only reduces the perplexity but also improves the character error on tasks of the text prediction and automatic speech recognition (ASR).
first_indexed 2024-12-22T09:49:04Z
format Article
id doaj.art-beab411c72404763aa9c9d5ca3e3258c
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-22T09:49:04Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-beab411c72404763aa9c9d5ca3e3258c2022-12-21T18:30:27ZengIEEEIEEE Access2169-35362019-01-017728967290410.1109/ACCESS.2019.29193288723332Morphological Verb-Aware Tibetan Language ModelKuntharrgyal Khysru0https://orcid.org/0000-0002-6673-9583Di Jin1Jianwu Dang2Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin, ChinaTianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin, ChinaTianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin, ChinaThe Tibetan language model (TLM) is the key to Tibetan natural language processing. In this paper, we first observe that, different from widely used languages, Tibetan contains many morphological verbs that rarely appear in natural sentences but play a key role in accurate text prediction. This property is usually ignored by existing methods and makes traditional training strategies less effective in constructing accurate and robust TLMs. Hence, we propose a morphological verb-aware TLM by offline learning via a character frequency reweighting strategy and online tuning of discriminative weights conditioned on morphological verbs. However, because of the influence of morphological verbs on the tense and semantics of sentences, it is necessary to consider the morphological verbs in Tibetan. As a result, compared with state-of-the-art methods, our method not only reduces the perplexity but also improves the character error on tasks of the text prediction and automatic speech recognition (ASR).https://ieeexplore.ieee.org/document/8723332/Tibetan language modeltext predictionautomatic speech recognitionmorphological verb-aware model
spellingShingle Kuntharrgyal Khysru
Di Jin
Jianwu Dang
Morphological Verb-Aware Tibetan Language Model
IEEE Access
Tibetan language model
text prediction
automatic speech recognition
morphological verb-aware model
title Morphological Verb-Aware Tibetan Language Model
title_full Morphological Verb-Aware Tibetan Language Model
title_fullStr Morphological Verb-Aware Tibetan Language Model
title_full_unstemmed Morphological Verb-Aware Tibetan Language Model
title_short Morphological Verb-Aware Tibetan Language Model
title_sort morphological verb aware tibetan language model
topic Tibetan language model
text prediction
automatic speech recognition
morphological verb-aware model
url https://ieeexplore.ieee.org/document/8723332/
work_keys_str_mv AT kuntharrgyalkhysru morphologicalverbawaretibetanlanguagemodel
AT dijin morphologicalverbawaretibetanlanguagemodel
AT jianwudang morphologicalverbawaretibetanlanguagemodel