Editing Compression Dictionaries toward Refined Compression-Based Feature-Space

This paper investigates how to construct a feature space for compression-based pattern recognition which judges the similarity between two objects <i>x</i> and <i>y</i> through the compression ratio to compress <i>x</i> with <i>y</i> (’s dictionary). S...

Full description

Bibliographic Details
Main Authors: Hisashi Koga, Shota Ouchi, Yuji Nakajima
Format: Article
Language:English
Published: MDPI AG 2022-06-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/13/6/301
_version_ 1797486239616598016
author Hisashi Koga
Shota Ouchi
Yuji Nakajima
author_facet Hisashi Koga
Shota Ouchi
Yuji Nakajima
author_sort Hisashi Koga
collection DOAJ
description This paper investigates how to construct a feature space for compression-based pattern recognition which judges the similarity between two objects <i>x</i> and <i>y</i> through the compression ratio to compress <i>x</i> with <i>y</i> (’s dictionary). Specifically, we focus on the known framework called PRDC, which represents an object <i>x</i> as a compression-ratio vector (CV) that lines up the compression ratios after <i>x</i> is compressed with multiple different dictionaries. By representing an object <i>x</i> as a CV, PRDC makes it possible to apply vector-based pattern recognition techniques to the compression-based pattern recognition. For PRDC, the dimensions, i.e., the dictionaries determine the quality of the CV space. This paper presents a practical technique to modify the chosen dictionaries in order to improve the performance of pattern recognition substantially: First, in order to make the dictionaries independent from each other, our method leaves any word shared by multiple dictionaries in only one dictionary and assures that any pair of dictionaries have no common words. Next, we transfer words among the dictionaries, so that all the dictionaries may keep roughly the same number of words and acquire the descriptive power evenly. The application to real image classification shows that our method increases classification accuracy by up to 8% compared with the case without our method, which demonstrates that our approach to keep the dictionaries independent is effective.
first_indexed 2024-03-09T23:30:26Z
format Article
id doaj.art-a267bf29b93641b7afc5415e2f0aca28
institution Directory Open Access Journal
issn 2078-2489
language English
last_indexed 2024-03-09T23:30:26Z
publishDate 2022-06-01
publisher MDPI AG
record_format Article
series Information
spelling doaj.art-a267bf29b93641b7afc5415e2f0aca282023-11-23T17:10:04ZengMDPI AGInformation2078-24892022-06-0113630110.3390/info13060301Editing Compression Dictionaries toward Refined Compression-Based Feature-SpaceHisashi Koga0Shota Ouchi1Yuji Nakajima2Department of Computer and Network Engineering, University of Electro-Communications, Tokyo 182-8585, JapanDepartment of Computer and Network Engineering, University of Electro-Communications, Tokyo 182-8585, JapanDepartment of Computer and Network Engineering, University of Electro-Communications, Tokyo 182-8585, JapanThis paper investigates how to construct a feature space for compression-based pattern recognition which judges the similarity between two objects <i>x</i> and <i>y</i> through the compression ratio to compress <i>x</i> with <i>y</i> (’s dictionary). Specifically, we focus on the known framework called PRDC, which represents an object <i>x</i> as a compression-ratio vector (CV) that lines up the compression ratios after <i>x</i> is compressed with multiple different dictionaries. By representing an object <i>x</i> as a CV, PRDC makes it possible to apply vector-based pattern recognition techniques to the compression-based pattern recognition. For PRDC, the dimensions, i.e., the dictionaries determine the quality of the CV space. This paper presents a practical technique to modify the chosen dictionaries in order to improve the performance of pattern recognition substantially: First, in order to make the dictionaries independent from each other, our method leaves any word shared by multiple dictionaries in only one dictionary and assures that any pair of dictionaries have no common words. Next, we transfer words among the dictionaries, so that all the dictionaries may keep roughly the same number of words and acquire the descriptive power evenly. The application to real image classification shows that our method increases classification accuracy by up to 8% compared with the case without our method, which demonstrates that our approach to keep the dictionaries independent is effective.https://www.mdpi.com/2078-2489/13/6/301compression-based pattern recognitiondata compressionfeature spacecompression dictionary
spellingShingle Hisashi Koga
Shota Ouchi
Yuji Nakajima
Editing Compression Dictionaries toward Refined Compression-Based Feature-Space
Information
compression-based pattern recognition
data compression
feature space
compression dictionary
title Editing Compression Dictionaries toward Refined Compression-Based Feature-Space
title_full Editing Compression Dictionaries toward Refined Compression-Based Feature-Space
title_fullStr Editing Compression Dictionaries toward Refined Compression-Based Feature-Space
title_full_unstemmed Editing Compression Dictionaries toward Refined Compression-Based Feature-Space
title_short Editing Compression Dictionaries toward Refined Compression-Based Feature-Space
title_sort editing compression dictionaries toward refined compression based feature space
topic compression-based pattern recognition
data compression
feature space
compression dictionary
url https://www.mdpi.com/2078-2489/13/6/301
work_keys_str_mv AT hisashikoga editingcompressiondictionariestowardrefinedcompressionbasedfeaturespace
AT shotaouchi editingcompressiondictionariestowardrefinedcompressionbasedfeaturespace
AT yujinakajima editingcompressiondictionariestowardrefinedcompressionbasedfeaturespace