IMPLEMENTASI ALGORITMA SUFFIX TREE CLUSTERING DAN NEAREST NEIGHBOR UNTUK MENGELOMPOKKAN BERITA PADA TIMELINE TWITTER
Dynamism news tweet content are disseminated by news organization providers on Twitter, causing the large number of. weets published every day. It can increase the length of Microblog web pages and inflict to the classic problems that require page scrolling process is longer.during the process of...
Main Authors: | , |
---|---|
Format: | Thesis |
Published: |
[Yogyakarta] : Universitas Gadjah Mada
2013
|
Subjects: |
Summary: | Dynamism news tweet content are disseminated by news organization providers
on Twitter, causing the large number of. weets published every day. It can increase the length of Microblog web pages and inflict to the classic problems that require page scrolling process is longer.during the process of reading all of The existing text news tweets. The problems solving that can be done to reduce the length of the web pages is by grouping the existing text news thematic. Grouping system suitable for this problem is the clustering system Based on some.existing research one good method in the process of clustering text documents is a Suffix Tree.Clustering (STC). This method has a very high accuracy rate because clusters create based on phrase-shared among documents existing text.
But one of the existing research in the process of clustering algorithms using
STC still produce text documents Other Topics cluster members in large numbers and
text documents members of this cluster are still relevant to the text document members
of the existing clusters. Therefore the text documents that is in the Other Topics cluster
need to compare with all text documents in the existing clusters to determine the level
of similarity. Thus a text document Other Topics cluster members can be classified into
one particular cluster by using the cosine similarity function based on the results of
calculations using the method of Vector Space Model (VSM) which refers to the term
frequency and the frequency of existing documents. Results of this calculation will be
used by the Nearest Neighbor.method in the classification process to determine the
destination cluster displacement for text documents Other Topics cluster members. The
main criteria of goal cluster as destination of displacement is the cluster with the highest number of members that have the highest similarity. The process of moving text document cluster members Other Topics impact on the reduction in the number of members of this cluster. Finally if the Other Topics cluste has no members then this cluster can be eliminated. |
---|