PARALELISASI MAXIMUM ENTROPY PART OF SPEECH TAGGING UNTUK BAHASA INDONESIA DENGAN MAPREDUCE

Researches in natural languange processing indicated that more data led to better accuracy. Processing this large scale of data using single machine has its own limitation that can be handled by processing data in parallel. This research used MapReduce on part-of-speech (POS) tagging. MapReduce...

Full description

Bibliographic Details
Main Authors:	Nurwidyantoro, Arif, Winarko, Edi
Format:	Thesis
Published:	[Yogyakarta] : Universitas Gadjah Mada 2011
Subjects:	Electrical and Electronic Engineering

_version_	1826043949063602176
author	Nurwidyantoro, Arif Winarko, Edi
author_facet	Nurwidyantoro, Arif Winarko, Edi
author_sort	Nurwidyantoro, Arif
collection	UGM
description	Researches in natural languange processing indicated that more data led to better accuracy. Processing this large scale of data using single machine has its own limitation that can be handled by processing data in parallel. This research used MapReduce on part-of-speech (POS) tagging. MapReduce is programming model developed for processing large data, while POS tagging is one the earliest steps in natural language processing. POS tagging approach used in this research is Maximum Entropy model in Bahasa Indonesia. MapReduce model is implemented in some parts of training and tagging process. MapReduce is implemented in dictionary, tagtoken, and feature creation, and also in calculation using improved iterative scaling (IIS). It is found out that calculation using IIS could not implemented using MapReduce model, because there is updating probability parameters that closely related so that it could not implemented in parallel. The experiments conducted using 100,000 and 1,000,000 words training corpus from Pan Localization and 12,000 words training corpus used in Wicaksono and Purwarianti's research. The experiments showed that total training time using MapReduce is faster than without using it. However, MapReduce's result reading time inside training process slow down the training total time. Tagging experiments conducted using different numbers of map and reduce process on different sizes corpora gathered from various news sites. The experiments showed MapReduce implementation could speedup the tagging process. The fastest result is shown by tagging process using 1,000,000 words corpus and 30 map process.
first_indexed	2024-03-13T22:11:49Z
format	Thesis
id	oai:generic.eprints.org:90876
institution	Universiti Gadjah Mada
last_indexed	2024-03-13T22:11:49Z
publishDate	2011
publisher	[Yogyakarta] : Universitas Gadjah Mada
record_format	dspace
spelling	oai:generic.eprints.org:908762020-02-20T08:56:23Z https://repository.ugm.ac.id/90876/ PARALELISASI MAXIMUM ENTROPY PART OF SPEECH TAGGING UNTUK BAHASA INDONESIA DENGAN MAPREDUCE Nurwidyantoro, Arif Winarko, Edi Electrical and Electronic Engineering Researches in natural languange processing indicated that more data led to better accuracy. Processing this large scale of data using single machine has its own limitation that can be handled by processing data in parallel. This research used MapReduce on part-of-speech (POS) tagging. MapReduce is programming model developed for processing large data, while POS tagging is one the earliest steps in natural language processing. POS tagging approach used in this research is Maximum Entropy model in Bahasa Indonesia. MapReduce model is implemented in some parts of training and tagging process. MapReduce is implemented in dictionary, tagtoken, and feature creation, and also in calculation using improved iterative scaling (IIS). It is found out that calculation using IIS could not implemented using MapReduce model, because there is updating probability parameters that closely related so that it could not implemented in parallel. The experiments conducted using 100,000 and 1,000,000 words training corpus from Pan Localization and 12,000 words training corpus used in Wicaksono and Purwarianti's research. The experiments showed that total training time using MapReduce is faster than without using it. However, MapReduce's result reading time inside training process slow down the training total time. Tagging experiments conducted using different numbers of map and reduce process on different sizes corpora gathered from various news sites. The experiments showed MapReduce implementation could speedup the tagging process. The fastest result is shown by tagging process using 1,000,000 words corpus and 30 map process. [Yogyakarta] : Universitas Gadjah Mada 2011 Thesis NonPeerReviewed Nurwidyantoro, Arif and Winarko, Edi (2011) PARALELISASI MAXIMUM ENTROPY PART OF SPEECH TAGGING UNTUK BAHASA INDONESIA DENGAN MAPREDUCE. Bachelor thesis, Universitas Gadjah Mada. http://etd.ugm.ac.id/index.php?mod=penelitian_detail&sub=PenelitianDetail&act=view&typ=html&buku_id=53205
spellingShingle	Electrical and Electronic Engineering Nurwidyantoro, Arif Winarko, Edi PARALELISASI MAXIMUM ENTROPY PART OF SPEECH TAGGING UNTUK BAHASA INDONESIA DENGAN MAPREDUCE
title	PARALELISASI MAXIMUM ENTROPY PART OF SPEECH TAGGING UNTUK BAHASA INDONESIA DENGAN MAPREDUCE
title_full	PARALELISASI MAXIMUM ENTROPY PART OF SPEECH TAGGING UNTUK BAHASA INDONESIA DENGAN MAPREDUCE
title_fullStr	PARALELISASI MAXIMUM ENTROPY PART OF SPEECH TAGGING UNTUK BAHASA INDONESIA DENGAN MAPREDUCE
title_full_unstemmed	PARALELISASI MAXIMUM ENTROPY PART OF SPEECH TAGGING UNTUK BAHASA INDONESIA DENGAN MAPREDUCE
title_short	PARALELISASI MAXIMUM ENTROPY PART OF SPEECH TAGGING UNTUK BAHASA INDONESIA DENGAN MAPREDUCE
title_sort	paralelisasi maximum entropy part of speech tagging untuk bahasa indonesia dengan mapreduce
topic	Electrical and Electronic Engineering
work_keys_str_mv	AT nurwidyantoroarif paralelisasimaximumentropypartofspeechtagginguntukbahasaindonesiadenganmapreduce AT winarkoedi paralelisasimaximumentropypartofspeechtagginguntukbahasaindonesiadenganmapreduce

PARALELISASI MAXIMUM ENTROPY PART OF SPEECH TAGGING UNTUK BAHASA INDONESIA DENGAN MAPREDUCE

Similar Items