TRAINING TREE ADJOINING GRAMMARS WITH HUGE TEXT CORPUS USING SPARK MAP REDUCE
Tree adjoining grammars (TAGs) are mildly context sensitive formalisms used mainly in modelling natural languages. Usage and research on these psycho linguistic formalisms have been erratic in the past decade, due to its demanding construction and difficulty to parse. However, they represent promisi...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
ICT Academy of Tamil Nadu
2015-07-01
|
Series: | ICTACT Journal on Soft Computing |
Subjects: | |
Online Access: | http://ictactjournals.in/paper/IJSC_Paper_4_pp_1021_1026.pdf |
_version_ | 1828356395707138048 |
---|---|
author | Vijay Krishna Menon S. Rajendran K.P. Soman |
author_facet | Vijay Krishna Menon S. Rajendran K.P. Soman |
author_sort | Vijay Krishna Menon |
collection | DOAJ |
description | Tree adjoining grammars (TAGs) are mildly context sensitive formalisms used mainly in modelling natural languages. Usage and research on these psycho linguistic formalisms have been erratic in the past decade, due to its demanding construction and difficulty to parse. However, they represent promising future for formalism based NLP in multilingual scenarios. In this paper we demonstrate basic synchronous Tree adjoining grammar for English-Tamil language pair that can be used readily for machine translation. We have also developed a multithreaded chart parser that gives ambiguous deep structures and a par dependency structure known as TAG derivation. Furthermore we then focus on a model for training this TAG for each language using a large corpus of text through a map reduce frequency count model in spark and estimation of various probabilistic parameters for the grammar trees thereafter; these parameters can be used to perform statistical parsing on the trained grammar. |
first_indexed | 2024-04-14T02:58:34Z |
format | Article |
id | doaj.art-dc2c28845f6042cfa4fbe20214f0f811 |
institution | Directory Open Access Journal |
issn | 0976-6561 2229-6956 |
language | English |
last_indexed | 2024-04-14T02:58:34Z |
publishDate | 2015-07-01 |
publisher | ICT Academy of Tamil Nadu |
record_format | Article |
series | ICTACT Journal on Soft Computing |
spelling | doaj.art-dc2c28845f6042cfa4fbe20214f0f8112022-12-22T02:16:01ZengICT Academy of Tamil NaduICTACT Journal on Soft Computing0976-65612229-69562015-07-015410211026TRAINING TREE ADJOINING GRAMMARS WITH HUGE TEXT CORPUS USING SPARK MAP REDUCEVijay Krishna Menon0S. Rajendran1K.P. Soman2Amrita Vishwa Vidyapeetham, IndiaAmrita Vishwa Vidyapeetham, IndiaAmrita Vishwa Vidyapeetham, IndiaTree adjoining grammars (TAGs) are mildly context sensitive formalisms used mainly in modelling natural languages. Usage and research on these psycho linguistic formalisms have been erratic in the past decade, due to its demanding construction and difficulty to parse. However, they represent promising future for formalism based NLP in multilingual scenarios. In this paper we demonstrate basic synchronous Tree adjoining grammar for English-Tamil language pair that can be used readily for machine translation. We have also developed a multithreaded chart parser that gives ambiguous deep structures and a par dependency structure known as TAG derivation. Furthermore we then focus on a model for training this TAG for each language using a large corpus of text through a map reduce frequency count model in spark and estimation of various probabilistic parameters for the grammar trees thereafter; these parameters can be used to perform statistical parsing on the trained grammar.http://ictactjournals.in/paper/IJSC_Paper_4_pp_1021_1026.pdfTAGsSparkProbabilistic GrammarRDDsParsing |
spellingShingle | Vijay Krishna Menon S. Rajendran K.P. Soman TRAINING TREE ADJOINING GRAMMARS WITH HUGE TEXT CORPUS USING SPARK MAP REDUCE ICTACT Journal on Soft Computing TAGs Spark Probabilistic Grammar RDDs Parsing |
title | TRAINING TREE ADJOINING GRAMMARS WITH HUGE TEXT CORPUS USING SPARK MAP REDUCE |
title_full | TRAINING TREE ADJOINING GRAMMARS WITH HUGE TEXT CORPUS USING SPARK MAP REDUCE |
title_fullStr | TRAINING TREE ADJOINING GRAMMARS WITH HUGE TEXT CORPUS USING SPARK MAP REDUCE |
title_full_unstemmed | TRAINING TREE ADJOINING GRAMMARS WITH HUGE TEXT CORPUS USING SPARK MAP REDUCE |
title_short | TRAINING TREE ADJOINING GRAMMARS WITH HUGE TEXT CORPUS USING SPARK MAP REDUCE |
title_sort | training tree adjoining grammars with huge text corpus using spark map reduce |
topic | TAGs Spark Probabilistic Grammar RDDs Parsing |
url | http://ictactjournals.in/paper/IJSC_Paper_4_pp_1021_1026.pdf |
work_keys_str_mv | AT vijaykrishnamenon trainingtreeadjoininggrammarswithhugetextcorpususingsparkmapreduce AT srajendran trainingtreeadjoininggrammarswithhugetextcorpususingsparkmapreduce AT kpsoman trainingtreeadjoininggrammarswithhugetextcorpususingsparkmapreduce |