An evolutionary-based term reduction approach to bilingual clustering of Malay-English corpora

The document clustering process groups the unstructured text documents into a predefined set of clusters in order to provide more information to the users. There are many studies conducted in clustering monolingual documents. With the enrichment of current technologies, the study of bilingual cluste...

Full description

Bibliographic Details
Main Authors: Rayner Alfred, Leow, Ching Leong, Joe Henry Obit
Format: Conference or Workshop Item
Language:English
English
Published: Springer International Publishing 2017
Subjects:
Online Access:https://eprints.ums.edu.my/id/eprint/29090/1/An%20Evolutionary-Based%20Term%20Reduction%20Approach%20to%20Bilingual%20Clustering%20of%20Malay-English%20Corpora%20ABSTRACT.pdf
https://eprints.ums.edu.my/id/eprint/29090/2/An%20Evolutionary-Based%20Term%20Reduction%20Approach%20to%20Bilingual%20Clustering%20of%20Malay-English%20Corpora.pdf
_version_ 1825714221075136512
author Rayner Alfred
Leow, Ching Leong
Joe Henry Obit
author_facet Rayner Alfred
Leow, Ching Leong
Joe Henry Obit
author_sort Rayner Alfred
collection UMS
description The document clustering process groups the unstructured text documents into a predefined set of clusters in order to provide more information to the users. There are many studies conducted in clustering monolingual documents. With the enrichment of current technologies, the study of bilingual clustering would not be a problem. However clustering bilingual document is still facing the same problem faced by a monolingual document clustering which is the “curse of dimensionality”. Hence, this encourages the study of term reduction technique in clustering bilingual documents. The objective in this study is to study the effects of reducing terms considered in clustering bilingual corpus in parallel for English and Malay documents. In this study, a genetic algorithm (GA) is used in order to reduce the number of feature selected. A single-point crossover with a crossover rate of 0.8 is used. Not only that, this study also assesses the effects of applying different mutation rate (e.g., 0.1 and 0.01) in selecting the number of features used in clustering bilingual documents. The result shows that the implementation of GA does improve the clustering mapping compared to the initial clustering mapping. Not only that, this study also discovers that GA with a mutation rate of 0.01 produces the best parallel clustering mapping results compared to GA with a mutation rate of 0.1.
first_indexed 2024-03-06T03:08:34Z
format Conference or Workshop Item
id ums.eprints-29090
institution Universiti Malaysia Sabah
language English
English
last_indexed 2024-03-06T03:08:34Z
publishDate 2017
publisher Springer International Publishing
record_format dspace
spelling ums.eprints-290902021-10-13T08:02:27Z https://eprints.ums.edu.my/id/eprint/29090/ An evolutionary-based term reduction approach to bilingual clustering of Malay-English corpora Rayner Alfred Leow, Ching Leong Joe Henry Obit QA76.75-76.765 Computer software The document clustering process groups the unstructured text documents into a predefined set of clusters in order to provide more information to the users. There are many studies conducted in clustering monolingual documents. With the enrichment of current technologies, the study of bilingual clustering would not be a problem. However clustering bilingual document is still facing the same problem faced by a monolingual document clustering which is the “curse of dimensionality”. Hence, this encourages the study of term reduction technique in clustering bilingual documents. The objective in this study is to study the effects of reducing terms considered in clustering bilingual corpus in parallel for English and Malay documents. In this study, a genetic algorithm (GA) is used in order to reduce the number of feature selected. A single-point crossover with a crossover rate of 0.8 is used. Not only that, this study also assesses the effects of applying different mutation rate (e.g., 0.1 and 0.01) in selecting the number of features used in clustering bilingual documents. The result shows that the implementation of GA does improve the clustering mapping compared to the initial clustering mapping. Not only that, this study also discovers that GA with a mutation rate of 0.01 produces the best parallel clustering mapping results compared to GA with a mutation rate of 0.1. Springer International Publishing 2017 Conference or Workshop Item PeerReviewed text en https://eprints.ums.edu.my/id/eprint/29090/1/An%20Evolutionary-Based%20Term%20Reduction%20Approach%20to%20Bilingual%20Clustering%20of%20Malay-English%20Corpora%20ABSTRACT.pdf text en https://eprints.ums.edu.my/id/eprint/29090/2/An%20Evolutionary-Based%20Term%20Reduction%20Approach%20to%20Bilingual%20Clustering%20of%20Malay-English%20Corpora.pdf Rayner Alfred and Leow, Ching Leong and Joe Henry Obit (2017) An evolutionary-based term reduction approach to bilingual clustering of Malay-English corpora. In: International Conference on Advances in Information and Communication Technology (ICTA 2016), 12–13 December 2016, Thai Nguyen city, Vietnam. https://link.springer.com/chapter/10.1007%2F978-3-319-49073-1_16 https://doi.org/10.1007/978-3-319-49073-1_16
spellingShingle QA76.75-76.765 Computer software
Rayner Alfred
Leow, Ching Leong
Joe Henry Obit
An evolutionary-based term reduction approach to bilingual clustering of Malay-English corpora
title An evolutionary-based term reduction approach to bilingual clustering of Malay-English corpora
title_full An evolutionary-based term reduction approach to bilingual clustering of Malay-English corpora
title_fullStr An evolutionary-based term reduction approach to bilingual clustering of Malay-English corpora
title_full_unstemmed An evolutionary-based term reduction approach to bilingual clustering of Malay-English corpora
title_short An evolutionary-based term reduction approach to bilingual clustering of Malay-English corpora
title_sort evolutionary based term reduction approach to bilingual clustering of malay english corpora
topic QA76.75-76.765 Computer software
url https://eprints.ums.edu.my/id/eprint/29090/1/An%20Evolutionary-Based%20Term%20Reduction%20Approach%20to%20Bilingual%20Clustering%20of%20Malay-English%20Corpora%20ABSTRACT.pdf
https://eprints.ums.edu.my/id/eprint/29090/2/An%20Evolutionary-Based%20Term%20Reduction%20Approach%20to%20Bilingual%20Clustering%20of%20Malay-English%20Corpora.pdf
work_keys_str_mv AT rayneralfred anevolutionarybasedtermreductionapproachtobilingualclusteringofmalayenglishcorpora
AT leowchingleong anevolutionarybasedtermreductionapproachtobilingualclusteringofmalayenglishcorpora
AT joehenryobit anevolutionarybasedtermreductionapproachtobilingualclusteringofmalayenglishcorpora
AT rayneralfred evolutionarybasedtermreductionapproachtobilingualclusteringofmalayenglishcorpora
AT leowchingleong evolutionarybasedtermreductionapproachtobilingualclusteringofmalayenglishcorpora
AT joehenryobit evolutionarybasedtermreductionapproachtobilingualclusteringofmalayenglishcorpora