Similarity Identification Based on Word Trigrams Using Exact String Matching Algorithms

Several studies regarding excellent exact string matching algorithms can be used to identify similarity, including the Rabin-Karp, Winnowing, and Horspool Boyer-Moore algorithms. In determining similarities, the Rabin-Karp and Winnowing algorithms use fingerprints, while the Horspool Boyer-Moore alg...

Full description

Bibliographic Details
Main Authors: Abdul Fadlil, Sunardi Sunardi, Rezki Ramdhani
Format: Article
Language:English
Published: Universitas Nusantara PGRI Kediri 2022-08-01
Series:Intensif: Jurnal Ilmiah Penelitian Teknologi dan Penerapan Sistem Informasi
Subjects:
Online Access:https://ojs.unpkediri.ac.id/index.php/intensif/article/view/18141
_version_ 1811332359708999680
author Abdul Fadlil
Sunardi Sunardi
Rezki Ramdhani
author_facet Abdul Fadlil
Sunardi Sunardi
Rezki Ramdhani
author_sort Abdul Fadlil
collection DOAJ
description Several studies regarding excellent exact string matching algorithms can be used to identify similarity, including the Rabin-Karp, Winnowing, and Horspool Boyer-Moore algorithms. In determining similarities, the Rabin-Karp and Winnowing algorithms use fingerprints, while the Horspool Boyer-Moore algorithm uses a bad-character table. However, previous research focused on identifying similarities using these algorithms based on character n-gram. In contrast, identification based on the word n-gram to determine the similarity based on its linguistic meaning, especially for longer strings, had not been covered yet. Therefore, a word-level trigram was proposed to identify similarities based on the word trigrams using the three algorithms and compare each performance. Based on precision, recall, and running time comparison, the Rabin-Karp algorithm results were 100%, 100%, and 0.19 ms, respectively; the Winnowing algorithm results with the smallest window were 100%, 56%, and 0.18 ms, respectively; and the Horspool algorithm results were 100%, 100%, and 0.06 ms. From these results, it can be concluded that the performance of the Horspool Boyer-Moore algorithm is better in terms of precision, recall, and running time.
first_indexed 2024-04-13T16:35:37Z
format Article
id doaj.art-d918829705a9448eba34e60531eba24d
institution Directory Open Access Journal
issn 2580-409X
2549-6824
language English
last_indexed 2024-04-13T16:35:37Z
publishDate 2022-08-01
publisher Universitas Nusantara PGRI Kediri
record_format Article
series Intensif: Jurnal Ilmiah Penelitian Teknologi dan Penerapan Sistem Informasi
spelling doaj.art-d918829705a9448eba34e60531eba24d2022-12-22T02:39:27ZengUniversitas Nusantara PGRI KediriIntensif: Jurnal Ilmiah Penelitian Teknologi dan Penerapan Sistem Informasi2580-409X2549-68242022-08-016225327010.29407/intensif.v6i2.1814118141Similarity Identification Based on Word Trigrams Using Exact String Matching AlgorithmsAbdul Fadlil0Sunardi Sunardi1Rezki Ramdhani2Universitas Ahmad DahlanUniversitas Ahmad DahlanUniversitas Ahmad DahlanSeveral studies regarding excellent exact string matching algorithms can be used to identify similarity, including the Rabin-Karp, Winnowing, and Horspool Boyer-Moore algorithms. In determining similarities, the Rabin-Karp and Winnowing algorithms use fingerprints, while the Horspool Boyer-Moore algorithm uses a bad-character table. However, previous research focused on identifying similarities using these algorithms based on character n-gram. In contrast, identification based on the word n-gram to determine the similarity based on its linguistic meaning, especially for longer strings, had not been covered yet. Therefore, a word-level trigram was proposed to identify similarities based on the word trigrams using the three algorithms and compare each performance. Based on precision, recall, and running time comparison, the Rabin-Karp algorithm results were 100%, 100%, and 0.19 ms, respectively; the Winnowing algorithm results with the smallest window were 100%, 56%, and 0.18 ms, respectively; and the Horspool algorithm results were 100%, 100%, and 0.06 ms. From these results, it can be concluded that the performance of the Horspool Boyer-Moore algorithm is better in terms of precision, recall, and running time.https://ojs.unpkediri.ac.id/index.php/intensif/article/view/18141string-matchingalgorithmperformancen-gramsimilarity
spellingShingle Abdul Fadlil
Sunardi Sunardi
Rezki Ramdhani
Similarity Identification Based on Word Trigrams Using Exact String Matching Algorithms
Intensif: Jurnal Ilmiah Penelitian Teknologi dan Penerapan Sistem Informasi
string-matching
algorithm
performance
n-gram
similarity
title Similarity Identification Based on Word Trigrams Using Exact String Matching Algorithms
title_full Similarity Identification Based on Word Trigrams Using Exact String Matching Algorithms
title_fullStr Similarity Identification Based on Word Trigrams Using Exact String Matching Algorithms
title_full_unstemmed Similarity Identification Based on Word Trigrams Using Exact String Matching Algorithms
title_short Similarity Identification Based on Word Trigrams Using Exact String Matching Algorithms
title_sort similarity identification based on word trigrams using exact string matching algorithms
topic string-matching
algorithm
performance
n-gram
similarity
url https://ojs.unpkediri.ac.id/index.php/intensif/article/view/18141
work_keys_str_mv AT abdulfadlil similarityidentificationbasedonwordtrigramsusingexactstringmatchingalgorithms
AT sunardisunardi similarityidentificationbasedonwordtrigramsusingexactstringmatchingalgorithms
AT rezkiramdhani similarityidentificationbasedonwordtrigramsusingexactstringmatchingalgorithms