Feature Map Compression for Video Coding for Machines Based on Receptive Block Based Principal Component Analysis

This paper presents a method to effectively compress the intermediate layer feature map of a convolutional neural network for the potential structures of Video Coding for Machines, which is an emerging technology for future machine consumption applications. Notably, most extant studies compress a si...

Full description

Bibliographic Details
Main Authors:	Minhun Lee, Hansol Choi, Jihoon Kim, Jihoon Do, Hyoungjin Kwon, Se Yoon Jeong, Donggyu Sim, Seoung-Jun Oh
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Moving picture experts group video coding for machines convolutional neural network principal component analysis feature map compression
Online Access:	https://ieeexplore.ieee.org/document/10064277/

Description
Summary:	This paper presents a method to effectively compress the intermediate layer feature map of a convolutional neural network for the potential structures of Video Coding for Machines, which is an emerging technology for future machine consumption applications. Notably, most extant studies compress a single feature map and hence cannot entirely consider both global and local information within the feature map. This limits performance maintenance during machine consumption tasks that analyze objects with various sizes in images/videos. To address this problem, a multiscale feature map compression method is proposed that consists of two major processes: receptive block based principal component analysis (RPCA) and uniform integer quantization. The RPCA derives the complete basis kernels of a feature map by selecting a set of major basis kernels that can represent a sufficient percentage of global or local information according to the variable-size receptive blocks of each feature map. After transforming each feature map using the set of major basis kernels, a uniform integer quantizer converts the 32-bit floating-point values of the set of major basis kernels, corresponding RPCA coefficients, and a mean vector to five-bit integer representation values. Experiment results reveal that the proposed method reduces the amount of feature maps by 99.30% with a loss of 8.30% in the average precision (AP) on the OpenImageV6 dataset and 0.77% in <inline-formula> <tex-math notation="LaTeX">$AP_{M}$ </tex-math></inline-formula> and 0.47% in <inline-formula> <tex-math notation="LaTeX">$AP_{L}$ </tex-math></inline-formula> on the MS COCO 2017 validation set while outperforming previous PCA-based feature map compression methods even at higher compression rates.
ISSN:	2169-3536

Feature Map Compression for Video Coding for Machines Based on Receptive Block Based Principal Component Analysis

Similar Items