Retrieval of source documents in a text reuse system

The architecture of the text-reuse detection system consists of three main modules, i.e., source retrieval, text analysis, and knowledge-based postprocessing. Each module plays an important role in the accuracy rate of the detection outputs. Therefore, this research focuses on developing the source...

Full description

Bibliographic Details
Main Authors:	Nathaniel Clarence Haryanto, Lucia Dwi Krisnawati, Antonius Rachmat Chrismanto
Format:	Article
Language:	English
Published:	Diponegoro University 2020-04-01
Series:	Jurnal Teknologi dan Sistem Komputer
Subjects:	deteksi daur ulang teks temu kembali dokumen sumber kata-kata penting pembobotan lokal
Online Access:	https://jtsiskom.undip.ac.id/index.php/jtsiskom/article/view/13523

_version_	1797285211001585664
author	Nathaniel Clarence Haryanto Lucia Dwi Krisnawati Antonius Rachmat Chrismanto
author_facet	Nathaniel Clarence Haryanto Lucia Dwi Krisnawati Antonius Rachmat Chrismanto
author_sort	Nathaniel Clarence Haryanto
collection	DOAJ
description	The architecture of the text-reuse detection system consists of three main modules, i.e., source retrieval, text analysis, and knowledge-based postprocessing. Each module plays an important role in the accuracy rate of the detection outputs. Therefore, this research focuses on developing the source retrieval system in cases where the source documents have been obfuscated in different levels. Two steps of term weighting were applied to get such documents. The first was the local-word weighting, which has been applied to the test or reused documents to select query per text segments. The tf-idf term weighting was applied for indexing all documents in the corpus and as the basis for computing cosine similarity between the queries per segment and the documents in the corpus. A two-step filtering technique was applied to get the source document candidates. Using artificial cases of text reuse testing, the system achieves the same rates of precision and recall that are 0.967, while the recall rate for the simulated cases of reused text is 0.66.
first_indexed	2024-03-07T18:00:00Z
format	Article
id	doaj.art-bbae2711ace54d6fac05528582692a9f
institution	Directory Open Access Journal
issn	2338-0403
language	English
last_indexed	2024-03-07T18:00:00Z
publishDate	2020-04-01
publisher	Diponegoro University
record_format	Article
series	Jurnal Teknologi dan Sistem Komputer
spelling	doaj.art-bbae2711ace54d6fac05528582692a9f2024-03-02T11:05:15ZengDiponegoro UniversityJurnal Teknologi dan Sistem Komputer2338-04032020-04-018214014910.14710/jtsiskom.8.2.2020.140-14912817Retrieval of source documents in a text reuse systemNathaniel Clarence Haryanto0Lucia Dwi Krisnawati1Antonius Rachmat Chrismanto2https://orcid.org/0000-0003-3247-2419Program Studi Informatika, Universitas Kristen Duta Wacana, IndonesiaProgram Studi Informatika, Universitas Kristen Duta Wacana, IndonesiaProgram Studi Informatika, Universitas Kristen Duta Wacana, IndonesiaThe architecture of the text-reuse detection system consists of three main modules, i.e., source retrieval, text analysis, and knowledge-based postprocessing. Each module plays an important role in the accuracy rate of the detection outputs. Therefore, this research focuses on developing the source retrieval system in cases where the source documents have been obfuscated in different levels. Two steps of term weighting were applied to get such documents. The first was the local-word weighting, which has been applied to the test or reused documents to select query per text segments. The tf-idf term weighting was applied for indexing all documents in the corpus and as the basis for computing cosine similarity between the queries per segment and the documents in the corpus. A two-step filtering technique was applied to get the source document candidates. Using artificial cases of text reuse testing, the system achieves the same rates of precision and recall that are 0.967, while the recall rate for the simulated cases of reused text is 0.66.https://jtsiskom.undip.ac.id/index.php/jtsiskom/article/view/13523deteksi daur ulang tekstemu kembali dokumen sumberkata-kata pentingpembobotan lokal
spellingShingle	Nathaniel Clarence Haryanto Lucia Dwi Krisnawati Antonius Rachmat Chrismanto Retrieval of source documents in a text reuse system Jurnal Teknologi dan Sistem Komputer deteksi daur ulang teks temu kembali dokumen sumber kata-kata penting pembobotan lokal
title	Retrieval of source documents in a text reuse system
title_full	Retrieval of source documents in a text reuse system
title_fullStr	Retrieval of source documents in a text reuse system
title_full_unstemmed	Retrieval of source documents in a text reuse system
title_short	Retrieval of source documents in a text reuse system
title_sort	retrieval of source documents in a text reuse system
topic	deteksi daur ulang teks temu kembali dokumen sumber kata-kata penting pembobotan lokal
url	https://jtsiskom.undip.ac.id/index.php/jtsiskom/article/view/13523
work_keys_str_mv	AT nathanielclarenceharyanto retrievalofsourcedocumentsinatextreusesystem AT luciadwikrisnawati retrievalofsourcedocumentsinatextreusesystem AT antoniusrachmatchrismanto retrievalofsourcedocumentsinatextreusesystem

Retrieval of source documents in a text reuse system

Similar Items