Development of a genetic-based hierarchical agglomerative clustering technique for parallel clustering of bilingual corpora based on reduced terms
In this project, we report on our work on applying Hierarchical Agglomerative Clustering (HAC) to a large corpus of documents where each appears both in Malay and English. We cluster these documents for each language and compare the results both with respect to the content of clusters produced. On t...
Main Authors: | , , |
---|---|
Format: | Research Report |
Language: | English |
Published: |
Universiti Malaysia Sabah
2010
|
Subjects: | |
Online Access: | https://eprints.ums.edu.my/id/eprint/24737/1/Development%20of%20genetic-based%20hierarchical.pdf |
_version_ | 1825713818928414720 |
---|---|
author | Rayner Alfred Jason Teo Chung, Seng Kheau |
author_facet | Rayner Alfred Jason Teo Chung, Seng Kheau |
author_sort | Rayner Alfred |
collection | UMS |
description | In this project, we report on our work on applying Hierarchical Agglomerative Clustering (HAC) to a large corpus of documents where each appears both in Malay and English. We cluster these documents for each language and compare the results both with respect to the content of clusters produced. On the data available, the results of clustering one language resemble the other, provided the number of clusters required is relatively small. Further? we study the effects of changing the method used to compute the inter-clusters distance that includes single link, complete link and average link distance between clusters. Finally, we describe an experiment employing a genetic algorithm to fine-tune the individual term weights in order to reproduce more closely a predefined set of clusters. |
first_indexed | 2024-03-06T03:02:24Z |
format | Research Report |
id | ums.eprints-24737 |
institution | Universiti Malaysia Sabah |
language | English |
last_indexed | 2024-03-06T03:02:24Z |
publishDate | 2010 |
publisher | Universiti Malaysia Sabah |
record_format | dspace |
spelling | ums.eprints-247372020-01-29T02:48:12Z https://eprints.ums.edu.my/id/eprint/24737/ Development of a genetic-based hierarchical agglomerative clustering technique for parallel clustering of bilingual corpora based on reduced terms Rayner Alfred Jason Teo Chung, Seng Kheau QA Mathematics In this project, we report on our work on applying Hierarchical Agglomerative Clustering (HAC) to a large corpus of documents where each appears both in Malay and English. We cluster these documents for each language and compare the results both with respect to the content of clusters produced. On the data available, the results of clustering one language resemble the other, provided the number of clusters required is relatively small. Further? we study the effects of changing the method used to compute the inter-clusters distance that includes single link, complete link and average link distance between clusters. Finally, we describe an experiment employing a genetic algorithm to fine-tune the individual term weights in order to reproduce more closely a predefined set of clusters. Universiti Malaysia Sabah 2010 Research Report NonPeerReviewed text en https://eprints.ums.edu.my/id/eprint/24737/1/Development%20of%20genetic-based%20hierarchical.pdf Rayner Alfred and Jason Teo and Chung, Seng Kheau (2010) Development of a genetic-based hierarchical agglomerative clustering technique for parallel clustering of bilingual corpora based on reduced terms. (Unpublished) |
spellingShingle | QA Mathematics Rayner Alfred Jason Teo Chung, Seng Kheau Development of a genetic-based hierarchical agglomerative clustering technique for parallel clustering of bilingual corpora based on reduced terms |
title | Development of a genetic-based hierarchical agglomerative clustering technique for parallel clustering of bilingual corpora based on reduced terms |
title_full | Development of a genetic-based hierarchical agglomerative clustering technique for parallel clustering of bilingual corpora based on reduced terms |
title_fullStr | Development of a genetic-based hierarchical agglomerative clustering technique for parallel clustering of bilingual corpora based on reduced terms |
title_full_unstemmed | Development of a genetic-based hierarchical agglomerative clustering technique for parallel clustering of bilingual corpora based on reduced terms |
title_short | Development of a genetic-based hierarchical agglomerative clustering technique for parallel clustering of bilingual corpora based on reduced terms |
title_sort | development of a genetic based hierarchical agglomerative clustering technique for parallel clustering of bilingual corpora based on reduced terms |
topic | QA Mathematics |
url | https://eprints.ums.edu.my/id/eprint/24737/1/Development%20of%20genetic-based%20hierarchical.pdf |
work_keys_str_mv | AT rayneralfred developmentofageneticbasedhierarchicalagglomerativeclusteringtechniqueforparallelclusteringofbilingualcorporabasedonreducedterms AT jasonteo developmentofageneticbasedhierarchicalagglomerativeclusteringtechniqueforparallelclusteringofbilingualcorporabasedonreducedterms AT chungsengkheau developmentofageneticbasedhierarchicalagglomerativeclusteringtechniqueforparallelclusteringofbilingualcorporabasedonreducedterms |