TRAINING TREE ADJOINING GRAMMARS WITH HUGE TEXT CORPUS USING SPARK MAP REDUCE

Tree adjoining grammars (TAGs) are mildly context sensitive formalisms used mainly in modelling natural languages. Usage and research on these psycho linguistic formalisms have been erratic in the past decade, due to its demanding construction and difficulty to parse. However, they represent promisi...

Full description

Bibliographic Details
Main Authors: Vijay Krishna Menon, S. Rajendran, K.P. Soman
Format: Article
Language:English
Published: ICT Academy of Tamil Nadu 2015-07-01
Series:ICTACT Journal on Soft Computing
Subjects:
Online Access:http://ictactjournals.in/paper/IJSC_Paper_4_pp_1021_1026.pdf
_version_ 1828356395707138048
author Vijay Krishna Menon
S. Rajendran
K.P. Soman
author_facet Vijay Krishna Menon
S. Rajendran
K.P. Soman
author_sort Vijay Krishna Menon
collection DOAJ
description Tree adjoining grammars (TAGs) are mildly context sensitive formalisms used mainly in modelling natural languages. Usage and research on these psycho linguistic formalisms have been erratic in the past decade, due to its demanding construction and difficulty to parse. However, they represent promising future for formalism based NLP in multilingual scenarios. In this paper we demonstrate basic synchronous Tree adjoining grammar for English-Tamil language pair that can be used readily for machine translation. We have also developed a multithreaded chart parser that gives ambiguous deep structures and a par dependency structure known as TAG derivation. Furthermore we then focus on a model for training this TAG for each language using a large corpus of text through a map reduce frequency count model in spark and estimation of various probabilistic parameters for the grammar trees thereafter; these parameters can be used to perform statistical parsing on the trained grammar.
first_indexed 2024-04-14T02:58:34Z
format Article
id doaj.art-dc2c28845f6042cfa4fbe20214f0f811
institution Directory Open Access Journal
issn 0976-6561
2229-6956
language English
last_indexed 2024-04-14T02:58:34Z
publishDate 2015-07-01
publisher ICT Academy of Tamil Nadu
record_format Article
series ICTACT Journal on Soft Computing
spelling doaj.art-dc2c28845f6042cfa4fbe20214f0f8112022-12-22T02:16:01ZengICT Academy of Tamil NaduICTACT Journal on Soft Computing0976-65612229-69562015-07-015410211026TRAINING TREE ADJOINING GRAMMARS WITH HUGE TEXT CORPUS USING SPARK MAP REDUCEVijay Krishna Menon0S. Rajendran1K.P. Soman2Amrita Vishwa Vidyapeetham, IndiaAmrita Vishwa Vidyapeetham, IndiaAmrita Vishwa Vidyapeetham, IndiaTree adjoining grammars (TAGs) are mildly context sensitive formalisms used mainly in modelling natural languages. Usage and research on these psycho linguistic formalisms have been erratic in the past decade, due to its demanding construction and difficulty to parse. However, they represent promising future for formalism based NLP in multilingual scenarios. In this paper we demonstrate basic synchronous Tree adjoining grammar for English-Tamil language pair that can be used readily for machine translation. We have also developed a multithreaded chart parser that gives ambiguous deep structures and a par dependency structure known as TAG derivation. Furthermore we then focus on a model for training this TAG for each language using a large corpus of text through a map reduce frequency count model in spark and estimation of various probabilistic parameters for the grammar trees thereafter; these parameters can be used to perform statistical parsing on the trained grammar.http://ictactjournals.in/paper/IJSC_Paper_4_pp_1021_1026.pdfTAGsSparkProbabilistic GrammarRDDsParsing
spellingShingle Vijay Krishna Menon
S. Rajendran
K.P. Soman
TRAINING TREE ADJOINING GRAMMARS WITH HUGE TEXT CORPUS USING SPARK MAP REDUCE
ICTACT Journal on Soft Computing
TAGs
Spark
Probabilistic Grammar
RDDs
Parsing
title TRAINING TREE ADJOINING GRAMMARS WITH HUGE TEXT CORPUS USING SPARK MAP REDUCE
title_full TRAINING TREE ADJOINING GRAMMARS WITH HUGE TEXT CORPUS USING SPARK MAP REDUCE
title_fullStr TRAINING TREE ADJOINING GRAMMARS WITH HUGE TEXT CORPUS USING SPARK MAP REDUCE
title_full_unstemmed TRAINING TREE ADJOINING GRAMMARS WITH HUGE TEXT CORPUS USING SPARK MAP REDUCE
title_short TRAINING TREE ADJOINING GRAMMARS WITH HUGE TEXT CORPUS USING SPARK MAP REDUCE
title_sort training tree adjoining grammars with huge text corpus using spark map reduce
topic TAGs
Spark
Probabilistic Grammar
RDDs
Parsing
url http://ictactjournals.in/paper/IJSC_Paper_4_pp_1021_1026.pdf
work_keys_str_mv AT vijaykrishnamenon trainingtreeadjoininggrammarswithhugetextcorpususingsparkmapreduce
AT srajendran trainingtreeadjoininggrammarswithhugetextcorpususingsparkmapreduce
AT kpsoman trainingtreeadjoininggrammarswithhugetextcorpususingsparkmapreduce