An evolutionary-based term reduction approach to bilingual clustering of Malay-English corpora
The document clustering process groups the unstructured text documents into a predefined set of clusters in order to provide more information to the users. There are many studies conducted in clustering monolingual documents. With the enrichment of current technologies, the study of bilingual cluste...
Main Authors: | , , |
---|---|
Format: | Conference or Workshop Item |
Language: | English English |
Published: |
Springer International Publishing
2017
|
Subjects: | |
Online Access: | https://eprints.ums.edu.my/id/eprint/29090/1/An%20Evolutionary-Based%20Term%20Reduction%20Approach%20to%20Bilingual%20Clustering%20of%20Malay-English%20Corpora%20ABSTRACT.pdf https://eprints.ums.edu.my/id/eprint/29090/2/An%20Evolutionary-Based%20Term%20Reduction%20Approach%20to%20Bilingual%20Clustering%20of%20Malay-English%20Corpora.pdf |
_version_ | 1825714221075136512 |
---|---|
author | Rayner Alfred Leow, Ching Leong Joe Henry Obit |
author_facet | Rayner Alfred Leow, Ching Leong Joe Henry Obit |
author_sort | Rayner Alfred |
collection | UMS |
description | The document clustering process groups the unstructured text documents into a predefined set of clusters in order to provide more information to the users. There are many studies conducted in clustering monolingual documents. With the enrichment of current technologies, the study of bilingual clustering would not be a problem. However clustering bilingual document is still facing the same problem faced by a monolingual document clustering which is the “curse of dimensionality”. Hence, this encourages the study of term reduction technique in clustering bilingual documents. The objective in this study is to study the effects of reducing terms considered in clustering bilingual corpus in parallel for English and Malay documents. In this study, a genetic algorithm (GA) is used in order to reduce the number of feature selected. A single-point crossover with a crossover rate of 0.8 is used. Not only that, this study also assesses the effects of applying different mutation rate (e.g., 0.1 and 0.01) in selecting the number of features used in clustering bilingual documents. The result shows that the implementation of GA does improve the clustering mapping compared to the initial clustering mapping. Not only that, this study also discovers that GA with a mutation rate of 0.01 produces the best parallel clustering mapping results compared to GA with a mutation rate of 0.1. |
first_indexed | 2024-03-06T03:08:34Z |
format | Conference or Workshop Item |
id | ums.eprints-29090 |
institution | Universiti Malaysia Sabah |
language | English English |
last_indexed | 2024-03-06T03:08:34Z |
publishDate | 2017 |
publisher | Springer International Publishing |
record_format | dspace |
spelling | ums.eprints-290902021-10-13T08:02:27Z https://eprints.ums.edu.my/id/eprint/29090/ An evolutionary-based term reduction approach to bilingual clustering of Malay-English corpora Rayner Alfred Leow, Ching Leong Joe Henry Obit QA76.75-76.765 Computer software The document clustering process groups the unstructured text documents into a predefined set of clusters in order to provide more information to the users. There are many studies conducted in clustering monolingual documents. With the enrichment of current technologies, the study of bilingual clustering would not be a problem. However clustering bilingual document is still facing the same problem faced by a monolingual document clustering which is the “curse of dimensionality”. Hence, this encourages the study of term reduction technique in clustering bilingual documents. The objective in this study is to study the effects of reducing terms considered in clustering bilingual corpus in parallel for English and Malay documents. In this study, a genetic algorithm (GA) is used in order to reduce the number of feature selected. A single-point crossover with a crossover rate of 0.8 is used. Not only that, this study also assesses the effects of applying different mutation rate (e.g., 0.1 and 0.01) in selecting the number of features used in clustering bilingual documents. The result shows that the implementation of GA does improve the clustering mapping compared to the initial clustering mapping. Not only that, this study also discovers that GA with a mutation rate of 0.01 produces the best parallel clustering mapping results compared to GA with a mutation rate of 0.1. Springer International Publishing 2017 Conference or Workshop Item PeerReviewed text en https://eprints.ums.edu.my/id/eprint/29090/1/An%20Evolutionary-Based%20Term%20Reduction%20Approach%20to%20Bilingual%20Clustering%20of%20Malay-English%20Corpora%20ABSTRACT.pdf text en https://eprints.ums.edu.my/id/eprint/29090/2/An%20Evolutionary-Based%20Term%20Reduction%20Approach%20to%20Bilingual%20Clustering%20of%20Malay-English%20Corpora.pdf Rayner Alfred and Leow, Ching Leong and Joe Henry Obit (2017) An evolutionary-based term reduction approach to bilingual clustering of Malay-English corpora. In: International Conference on Advances in Information and Communication Technology (ICTA 2016), 12–13 December 2016, Thai Nguyen city, Vietnam. https://link.springer.com/chapter/10.1007%2F978-3-319-49073-1_16 https://doi.org/10.1007/978-3-319-49073-1_16 |
spellingShingle | QA76.75-76.765 Computer software Rayner Alfred Leow, Ching Leong Joe Henry Obit An evolutionary-based term reduction approach to bilingual clustering of Malay-English corpora |
title | An evolutionary-based term reduction approach to bilingual clustering of Malay-English corpora |
title_full | An evolutionary-based term reduction approach to bilingual clustering of Malay-English corpora |
title_fullStr | An evolutionary-based term reduction approach to bilingual clustering of Malay-English corpora |
title_full_unstemmed | An evolutionary-based term reduction approach to bilingual clustering of Malay-English corpora |
title_short | An evolutionary-based term reduction approach to bilingual clustering of Malay-English corpora |
title_sort | evolutionary based term reduction approach to bilingual clustering of malay english corpora |
topic | QA76.75-76.765 Computer software |
url | https://eprints.ums.edu.my/id/eprint/29090/1/An%20Evolutionary-Based%20Term%20Reduction%20Approach%20to%20Bilingual%20Clustering%20of%20Malay-English%20Corpora%20ABSTRACT.pdf https://eprints.ums.edu.my/id/eprint/29090/2/An%20Evolutionary-Based%20Term%20Reduction%20Approach%20to%20Bilingual%20Clustering%20of%20Malay-English%20Corpora.pdf |
work_keys_str_mv | AT rayneralfred anevolutionarybasedtermreductionapproachtobilingualclusteringofmalayenglishcorpora AT leowchingleong anevolutionarybasedtermreductionapproachtobilingualclusteringofmalayenglishcorpora AT joehenryobit anevolutionarybasedtermreductionapproachtobilingualclusteringofmalayenglishcorpora AT rayneralfred evolutionarybasedtermreductionapproachtobilingualclusteringofmalayenglishcorpora AT leowchingleong evolutionarybasedtermreductionapproachtobilingualclusteringofmalayenglishcorpora AT joehenryobit evolutionarybasedtermreductionapproachtobilingualclusteringofmalayenglishcorpora |