Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting

With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable...

Full description

Bibliographic Details
Main Authors: Sh. Rafieian, A. Baraani dastjerdi
Format: Article
Language:English
Published: Shahrood University of Technology 2016-07-01
Series:Journal of Artificial Intelligence and Data Mining
Subjects:
Online Access:http://jad.shahroodut.ac.ir/article_580_5ae5d7980323bacb7a8c36dec40456f9.pdf
_version_ 1819020352284524544
author Sh. Rafieian
A. Baraani dastjerdi
author_facet Sh. Rafieian
A. Baraani dastjerdi
author_sort Sh. Rafieian
collection DOAJ
description With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucity of works in the field of Persian language due to lack of reliable plagiarism checkers in Persian there is a need for a method to improve the accuracy of detecting plagiarized Persian phrases. Attempt is made in the article to present the PCP solution. This solution is a combinational method that in addition to meaning and stem of words, synonyms and pluralization is dealt with by applying the document tree representation based on manner fingerprinting the text in the 3-grams words. The obtained grams are eliminated from the text, hashed through the BKDR hash function, and stored as the fingerprint of a document in fingerprints of reference documents repository, for checking suspicious documents. The PCP proposed method here is evaluated by eight experiments on seven different sets, which include suspicions document and the reference document, from the Hamshahri newspaper website. The results indicate that accuracy of this proposed method in detection of similar texts in comparison with "Winnowing" localized method has 21.15 percent is improvement average. The accuracy of the PCP method in detecting the similarity in comparison with the language-free tool reveals 31.65 percent improvement average.
first_indexed 2024-12-21T03:49:50Z
format Article
id doaj.art-f44db85355da4360877531f01eb8baaf
institution Directory Open Access Journal
issn 2322-5211
2322-4444
language English
last_indexed 2024-12-21T03:49:50Z
publishDate 2016-07-01
publisher Shahrood University of Technology
record_format Article
series Journal of Artificial Intelligence and Data Mining
spelling doaj.art-f44db85355da4360877531f01eb8baaf2022-12-21T19:17:00ZengShahrood University of TechnologyJournal of Artificial Intelligence and Data Mining2322-52112322-44442016-07-0142125133doi: 10.5829/idosi.JAIDM.2016.04.02.01580Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprintingSh. Rafieian0A. Baraani dastjerdi1Computer Engineering Department, Sheikh Bahaii University, Isfahan, IranComputer Engineering Department, University of Isfahan, Isfahan, Iran.With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucity of works in the field of Persian language due to lack of reliable plagiarism checkers in Persian there is a need for a method to improve the accuracy of detecting plagiarized Persian phrases. Attempt is made in the article to present the PCP solution. This solution is a combinational method that in addition to meaning and stem of words, synonyms and pluralization is dealt with by applying the document tree representation based on manner fingerprinting the text in the 3-grams words. The obtained grams are eliminated from the text, hashed through the BKDR hash function, and stored as the fingerprint of a document in fingerprints of reference documents repository, for checking suspicious documents. The PCP proposed method here is evaluated by eight experiments on seven different sets, which include suspicions document and the reference document, from the Hamshahri newspaper website. The results indicate that accuracy of this proposed method in detection of similar texts in comparison with "Winnowing" localized method has 21.15 percent is improvement average. The accuracy of the PCP method in detecting the similarity in comparison with the language-free tool reveals 31.65 percent improvement average.http://jad.shahroodut.ac.ir/article_580_5ae5d7980323bacb7a8c36dec40456f9.pdfText-MiningNatural Language ProcessingPlagiarism detectionExternal plagiarism detectionPersian Language
spellingShingle Sh. Rafieian
A. Baraani dastjerdi
Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting
Journal of Artificial Intelligence and Data Mining
Text-Mining
Natural Language Processing
Plagiarism detection
External plagiarism detection
Persian Language
title Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting
title_full Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting
title_fullStr Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting
title_full_unstemmed Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting
title_short Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting
title_sort plagiarism checker for persian pcp texts using hash based tree representative fingerprinting
topic Text-Mining
Natural Language Processing
Plagiarism detection
External plagiarism detection
Persian Language
url http://jad.shahroodut.ac.ir/article_580_5ae5d7980323bacb7a8c36dec40456f9.pdf
work_keys_str_mv AT shrafieian plagiarismcheckerforpersianpcptextsusinghashbasedtreerepresentativefingerprinting
AT abaraanidastjerdi plagiarismcheckerforpersianpcptextsusinghashbasedtreerepresentativefingerprinting