Pembobotan Kata berdasarkan Kluster untuk Peringkasan Otomatis Multi Dokumen

Multi-document summarization is a technique for getting information. The information consists of several lines of sentences that aim to describe the contents of the entire document relevantly. Several algorithms with various criteria have been carried out. In general, these criteria are the preproce...

Full description

Bibliographic Details
Main Authors:	Lukman Hakim, Fadli Husein Wattiheluw, Agus Zainal Arifin, Aminul Wahib
Format:	Article
Language:	English
Published:	Indonesia Association of Computational Linguistics (INACL) 2018-09-01
Series:	Jurnal Linguistik Komputasional
Online Access:	http://inacl.id/journal/index.php/jlk/article/view/7

_version_	1819114549394014208
author	Lukman Hakim Fadli Husein Wattiheluw Agus Zainal Arifin Aminul Wahib
author_facet	Lukman Hakim Fadli Husein Wattiheluw Agus Zainal Arifin Aminul Wahib
author_sort	Lukman Hakim
collection	DOAJ
description	Multi-document summarization is a technique for getting information. The information consists of several lines of sentences that aim to describe the contents of the entire document relevantly. Several algorithms with various criteria have been carried out. In general, these criteria are the preprocessing, cluster, and representative sentence selection to produce summaries that have high relevance. In some conditions, the cluster stage is one of the important stages to produce summarization. Existing research cannot determine the number of clusters to be formed. Therefore, we propose clustering techniques using cluster hierarchy. This technique measures the similarity between sentences using cosine similarity. These sentences are clustered based on their similarity values. Clusters that have the highest level of similarity with other clusters will be merged into one cluster. This merger process will continue until one cluster remains. Experimental results on the 2004 Document Understanding Document (DUC) dataset and using two scenarios that use 132, 135, 137 and 140 clusters resulting in fluctuating values. The smaller the number of clusters does not guarantee an increase in the value of ROUGE-1. The method proposed using the same number of clusters has a lower ROUGE-1 value than the previous method. This is because in cluster 140 the similarity values in each cluster experienced a decrease in similarity values.
first_indexed	2024-12-22T04:47:04Z
format	Article
id	doaj.art-9f86ddde77d54e74b67598fcc7136c92
institution	Directory Open Access Journal
issn	2621-9336
language	English
last_indexed	2024-12-22T04:47:04Z
publishDate	2018-09-01
publisher	Indonesia Association of Computational Linguistics (INACL)
record_format	Article
series	Jurnal Linguistik Komputasional
spelling	doaj.art-9f86ddde77d54e74b67598fcc7136c922022-12-21T18:38:34ZengIndonesia Association of Computational Linguistics (INACL)Jurnal Linguistik Komputasional2621-93362018-09-0112384410.26418/jlk.v1i2.77Pembobotan Kata berdasarkan Kluster untuk Peringkasan Otomatis Multi DokumenLukman HakimFadli Husein WattiheluwAgus Zainal ArifinAminul WahibMulti-document summarization is a technique for getting information. The information consists of several lines of sentences that aim to describe the contents of the entire document relevantly. Several algorithms with various criteria have been carried out. In general, these criteria are the preprocessing, cluster, and representative sentence selection to produce summaries that have high relevance. In some conditions, the cluster stage is one of the important stages to produce summarization. Existing research cannot determine the number of clusters to be formed. Therefore, we propose clustering techniques using cluster hierarchy. This technique measures the similarity between sentences using cosine similarity. These sentences are clustered based on their similarity values. Clusters that have the highest level of similarity with other clusters will be merged into one cluster. This merger process will continue until one cluster remains. Experimental results on the 2004 Document Understanding Document (DUC) dataset and using two scenarios that use 132, 135, 137 and 140 clusters resulting in fluctuating values. The smaller the number of clusters does not guarantee an increase in the value of ROUGE-1. The method proposed using the same number of clusters has a lower ROUGE-1 value than the previous method. This is because in cluster 140 the similarity values in each cluster experienced a decrease in similarity values.http://inacl.id/journal/index.php/jlk/article/view/7
spellingShingle	Lukman Hakim Fadli Husein Wattiheluw Agus Zainal Arifin Aminul Wahib Pembobotan Kata berdasarkan Kluster untuk Peringkasan Otomatis Multi Dokumen Jurnal Linguistik Komputasional
title	Pembobotan Kata berdasarkan Kluster untuk Peringkasan Otomatis Multi Dokumen
title_full	Pembobotan Kata berdasarkan Kluster untuk Peringkasan Otomatis Multi Dokumen
title_fullStr	Pembobotan Kata berdasarkan Kluster untuk Peringkasan Otomatis Multi Dokumen
title_full_unstemmed	Pembobotan Kata berdasarkan Kluster untuk Peringkasan Otomatis Multi Dokumen
title_short	Pembobotan Kata berdasarkan Kluster untuk Peringkasan Otomatis Multi Dokumen
title_sort	pembobotan kata berdasarkan kluster untuk peringkasan otomatis multi dokumen
url	http://inacl.id/journal/index.php/jlk/article/view/7
work_keys_str_mv	AT lukmanhakim pembobotankataberdasarkanklusteruntukperingkasanotomatismultidokumen AT fadlihuseinwattiheluw pembobotankataberdasarkanklusteruntukperingkasanotomatismultidokumen AT aguszainalarifin pembobotankataberdasarkanklusteruntukperingkasanotomatismultidokumen AT aminulwahib pembobotankataberdasarkanklusteruntukperingkasanotomatismultidokumen

Pembobotan Kata berdasarkan Kluster untuk Peringkasan Otomatis Multi Dokumen

Similar Items