Beyond learned metadata-based raw image reconstruction

While raw images possess distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels, they are not widely adopted by general users due to their substantial storage requirements. Very recent studies propose to compress raw images by designing sampling masks within the p...

Full description

Bibliographic Details
Main Authors:	Wang, Yufei, Yu, Yi, Yang, Wenhan, Guo, Lanqing, Chau, Lap-Pui, Kot, Alex Chichung, Wen, Bihan
Other Authors:	School of Electrical and Electronic Engineering
Format:	Journal Article
Language:	English
Published:	2024
Subjects:	Engineering Context Model Image compression
Online Access:	https://hdl.handle.net/10356/179439

_version_	1826112358544572416
author	Wang, Yufei Yu, Yi Yang, Wenhan Guo, Lanqing Chau, Lap-Pui Kot, Alex Chichung Wen, Bihan
author2	School of Electrical and Electronic Engineering
author_facet	School of Electrical and Electronic Engineering Wang, Yufei Yu, Yi Yang, Wenhan Guo, Lanqing Chau, Lap-Pui Kot, Alex Chichung Wen, Bihan
author_sort	Wang, Yufei
collection	NTU
description	While raw images possess distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels, they are not widely adopted by general users due to their substantial storage requirements. Very recent studies propose to compress raw images by designing sampling masks within the pixel space of the raw image. However, these approaches often leave space for pursuing more effective image representations and compact metadata. In this work, we propose a novel framework that learns a compact representation in the latent space, serving as metadata, in an end-to-end manner. Compared with lossy image compression, we analyze the intrinsic difference of the raw image reconstruction task caused by rich information from the sRGB image. Based on the analysis, a novel design of the backbone with asymmetric and hybrid spatial feature resolutions is proposed, which significantly improves the rate-distortion performance. Besides, we propose a novel design of the sRGB-guided context model, which can better predict the order masks of encoding/decoding based on both the sRGB image and the the masks of already processed features. Benefited from the better modeling of the correlation between order masks, the already processed information can be better utilized. Moreover, a novel sRGB-guided adaptive quantization precision strategy, which dynamically assigns varying levels of quantization precision to different regions, further enhances the representation ability of the model. Finally, based on the iterative properties of the proposed context model, we propose a novel strategy to achieve variable bit rates using a single model. This strategy allows for the continuous convergence of a wide range of bit rates. We demonstrate how our raw image compression scheme effectively allocates more bits to image regions that hold greater global importance. Extensive experimental results validate the superior performance of the proposed method, achieving high-quality raw image reconstruction with a smaller metadata size, compared with existing SOTA methods.
first_indexed	2024-10-01T03:05:42Z
format	Journal Article
id	ntu-10356/179439
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T03:05:42Z
publishDate	2024
record_format	dspace
spelling	ntu-10356/1794392024-07-31T03:07:02Z Beyond learned metadata-based raw image reconstruction Wang, Yufei Yu, Yi Yang, Wenhan Guo, Lanqing Chau, Lap-Pui Kot, Alex Chichung Wen, Bihan School of Electrical and Electronic Engineering Rapid-Rich Object Search (ROSE) Lab Engineering Context Model Image compression While raw images possess distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels, they are not widely adopted by general users due to their substantial storage requirements. Very recent studies propose to compress raw images by designing sampling masks within the pixel space of the raw image. However, these approaches often leave space for pursuing more effective image representations and compact metadata. In this work, we propose a novel framework that learns a compact representation in the latent space, serving as metadata, in an end-to-end manner. Compared with lossy image compression, we analyze the intrinsic difference of the raw image reconstruction task caused by rich information from the sRGB image. Based on the analysis, a novel design of the backbone with asymmetric and hybrid spatial feature resolutions is proposed, which significantly improves the rate-distortion performance. Besides, we propose a novel design of the sRGB-guided context model, which can better predict the order masks of encoding/decoding based on both the sRGB image and the the masks of already processed features. Benefited from the better modeling of the correlation between order masks, the already processed information can be better utilized. Moreover, a novel sRGB-guided adaptive quantization precision strategy, which dynamically assigns varying levels of quantization precision to different regions, further enhances the representation ability of the model. Finally, based on the iterative properties of the proposed context model, we propose a novel strategy to achieve variable bit rates using a single model. This strategy allows for the continuous convergence of a wide range of bit rates. We demonstrate how our raw image compression scheme effectively allocates more bits to image regions that hold greater global importance. Extensive experimental results validate the superior performance of the proposed method, achieving high-quality raw image reconstruction with a smaller metadata size, compared with existing SOTA methods. Ministry of Education (MOE) This research is supported in part by the NTU-PKU Joint Research Institute (a collaboration between the Nanyang Technological University and Peking University that is sponsored by a donation from the Ng Teng Fong Charitable Foundation), the Basic and Frontier Research Project of PCL, the Major Key Project of PCL, and the MOE AcRF Tier 1 (RG61/22) and Start-Up Grant. 2024-07-31T03:07:02Z 2024-07-31T03:07:02Z 2024 Journal Article Wang, Y., Yu, Y., Yang, W., Guo, L., Chau, L., Kot, A. C. & Wen, B. (2024). Beyond learned metadata-based raw image reconstruction. International Journal of Computer Vision. https://dx.doi.org/10.1007/s11263-024-02143-2 0920-5691 https://hdl.handle.net/10356/179439 10.1007/s11263-024-02143-2 2-s2.0-85196175054 en RG61/22 International Journal of Computer Vision © 2024 The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature. All rights reserved.
spellingShingle	Engineering Context Model Image compression Wang, Yufei Yu, Yi Yang, Wenhan Guo, Lanqing Chau, Lap-Pui Kot, Alex Chichung Wen, Bihan Beyond learned metadata-based raw image reconstruction
title	Beyond learned metadata-based raw image reconstruction
title_full	Beyond learned metadata-based raw image reconstruction
title_fullStr	Beyond learned metadata-based raw image reconstruction
title_full_unstemmed	Beyond learned metadata-based raw image reconstruction
title_short	Beyond learned metadata-based raw image reconstruction
title_sort	beyond learned metadata based raw image reconstruction
topic	Engineering Context Model Image compression
url	https://hdl.handle.net/10356/179439
work_keys_str_mv	AT wangyufei beyondlearnedmetadatabasedrawimagereconstruction AT yuyi beyondlearnedmetadatabasedrawimagereconstruction AT yangwenhan beyondlearnedmetadatabasedrawimagereconstruction AT guolanqing beyondlearnedmetadatabasedrawimagereconstruction AT chaulappui beyondlearnedmetadatabasedrawimagereconstruction AT kotalexchichung beyondlearnedmetadatabasedrawimagereconstruction AT wenbihan beyondlearnedmetadatabasedrawimagereconstruction

Beyond learned metadata-based raw image reconstruction

Similar Items