Similarity Identification Based on Word Trigrams Using Exact String Matching Algorithms
Several studies regarding excellent exact string matching algorithms can be used to identify similarity, including the Rabin-Karp, Winnowing, and Horspool Boyer-Moore algorithms. In determining similarities, the Rabin-Karp and Winnowing algorithms use fingerprints, while the Horspool Boyer-Moore alg...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Universitas Nusantara PGRI Kediri
2022-08-01
|
Series: | Intensif: Jurnal Ilmiah Penelitian Teknologi dan Penerapan Sistem Informasi |
Subjects: | |
Online Access: | https://ojs.unpkediri.ac.id/index.php/intensif/article/view/18141 |
_version_ | 1811332359708999680 |
---|---|
author | Abdul Fadlil Sunardi Sunardi Rezki Ramdhani |
author_facet | Abdul Fadlil Sunardi Sunardi Rezki Ramdhani |
author_sort | Abdul Fadlil |
collection | DOAJ |
description | Several studies regarding excellent exact string matching algorithms can be used to identify similarity, including the Rabin-Karp, Winnowing, and Horspool Boyer-Moore algorithms. In determining similarities, the Rabin-Karp and Winnowing algorithms use fingerprints, while the Horspool Boyer-Moore algorithm uses a bad-character table. However, previous research focused on identifying similarities using these algorithms based on character n-gram. In contrast, identification based on the word n-gram to determine the similarity based on its linguistic meaning, especially for longer strings, had not been covered yet. Therefore, a word-level trigram was proposed to identify similarities based on the word trigrams using the three algorithms and compare each performance. Based on precision, recall, and running time comparison, the Rabin-Karp algorithm results were 100%, 100%, and 0.19 ms, respectively; the Winnowing algorithm results with the smallest window were 100%, 56%, and 0.18 ms, respectively; and the Horspool algorithm results were 100%, 100%, and 0.06 ms. From these results, it can be concluded that the performance of the Horspool Boyer-Moore algorithm is better in terms of precision, recall, and running time. |
first_indexed | 2024-04-13T16:35:37Z |
format | Article |
id | doaj.art-d918829705a9448eba34e60531eba24d |
institution | Directory Open Access Journal |
issn | 2580-409X 2549-6824 |
language | English |
last_indexed | 2024-04-13T16:35:37Z |
publishDate | 2022-08-01 |
publisher | Universitas Nusantara PGRI Kediri |
record_format | Article |
series | Intensif: Jurnal Ilmiah Penelitian Teknologi dan Penerapan Sistem Informasi |
spelling | doaj.art-d918829705a9448eba34e60531eba24d2022-12-22T02:39:27ZengUniversitas Nusantara PGRI KediriIntensif: Jurnal Ilmiah Penelitian Teknologi dan Penerapan Sistem Informasi2580-409X2549-68242022-08-016225327010.29407/intensif.v6i2.1814118141Similarity Identification Based on Word Trigrams Using Exact String Matching AlgorithmsAbdul Fadlil0Sunardi Sunardi1Rezki Ramdhani2Universitas Ahmad DahlanUniversitas Ahmad DahlanUniversitas Ahmad DahlanSeveral studies regarding excellent exact string matching algorithms can be used to identify similarity, including the Rabin-Karp, Winnowing, and Horspool Boyer-Moore algorithms. In determining similarities, the Rabin-Karp and Winnowing algorithms use fingerprints, while the Horspool Boyer-Moore algorithm uses a bad-character table. However, previous research focused on identifying similarities using these algorithms based on character n-gram. In contrast, identification based on the word n-gram to determine the similarity based on its linguistic meaning, especially for longer strings, had not been covered yet. Therefore, a word-level trigram was proposed to identify similarities based on the word trigrams using the three algorithms and compare each performance. Based on precision, recall, and running time comparison, the Rabin-Karp algorithm results were 100%, 100%, and 0.19 ms, respectively; the Winnowing algorithm results with the smallest window were 100%, 56%, and 0.18 ms, respectively; and the Horspool algorithm results were 100%, 100%, and 0.06 ms. From these results, it can be concluded that the performance of the Horspool Boyer-Moore algorithm is better in terms of precision, recall, and running time.https://ojs.unpkediri.ac.id/index.php/intensif/article/view/18141string-matchingalgorithmperformancen-gramsimilarity |
spellingShingle | Abdul Fadlil Sunardi Sunardi Rezki Ramdhani Similarity Identification Based on Word Trigrams Using Exact String Matching Algorithms Intensif: Jurnal Ilmiah Penelitian Teknologi dan Penerapan Sistem Informasi string-matching algorithm performance n-gram similarity |
title | Similarity Identification Based on Word Trigrams Using Exact String Matching Algorithms |
title_full | Similarity Identification Based on Word Trigrams Using Exact String Matching Algorithms |
title_fullStr | Similarity Identification Based on Word Trigrams Using Exact String Matching Algorithms |
title_full_unstemmed | Similarity Identification Based on Word Trigrams Using Exact String Matching Algorithms |
title_short | Similarity Identification Based on Word Trigrams Using Exact String Matching Algorithms |
title_sort | similarity identification based on word trigrams using exact string matching algorithms |
topic | string-matching algorithm performance n-gram similarity |
url | https://ojs.unpkediri.ac.id/index.php/intensif/article/view/18141 |
work_keys_str_mv | AT abdulfadlil similarityidentificationbasedonwordtrigramsusingexactstringmatchingalgorithms AT sunardisunardi similarityidentificationbasedonwordtrigramsusingexactstringmatchingalgorithms AT rezkiramdhani similarityidentificationbasedonwordtrigramsusingexactstringmatchingalgorithms |