Deep Learning and Bidirectional Optical Flow Based Viewport Predictions for 360° Video Coding

The rapid development of virtual reality applications continues to urge better compression of 360° videos owing to the large volume of content. These videos are typically converted to 2-D formats using various projection techniques in order to benefit from ad-hoc coding tools designed to...

Full description

Bibliographic Details
Main Authors:	Jayasingam Adhuran, Gosala Kulupana, Anil Fernando
Format:	Article
Language:	English
Published:	IEEE 2022-01-01
Series:	IEEE Access
Subjects:	360° video perceptual coding Regions of Interest viewport prediction Versatile Video Coding
Online Access:	https://ieeexplore.ieee.org/document/9940241/

_version_	1811313769224077312
author	Jayasingam Adhuran Gosala Kulupana Anil Fernando
author_facet	Jayasingam Adhuran Gosala Kulupana Anil Fernando
author_sort	Jayasingam Adhuran
collection	DOAJ
description	The rapid development of virtual reality applications continues to urge better compression of 360° videos owing to the large volume of content. These videos are typically converted to 2-D formats using various projection techniques in order to benefit from ad-hoc coding tools designed to support conventional 2-D video compression. Although recently emerged video coding standard, Versatile Video Coding (VVC) introduces 360° video specific coding tools, it fails to prioritize the user observed regions in 360° videos, represented by the rectilinear images called the viewports. This leads to the encoding of redundant regions in the video frames, escalating the bit rate cost of the videos. In response to this issue, this paper proposes a novel 360° video coding framework for VVC which exploits user observed viewport information to alleviate pixel redundancy in 360° videos. In this regard, bidirectional optical flow, Gaussian filter and Spherical Convolutional Neural Networks (Spherical CNN) are deployed to extract perceptual features and predict user observed viewports. By appropriately fusing the predicted viewports on the 2-D projected 360° video frames, a novel Regions of Interest (ROI) aware weightmap is developed which can be used to mask the source video and introduce adaptive changes to the Lagrange and quantization parameters in VVC. Comprehensive experiments conducted in the context of VVC Test Model (VTM) 7.0 show that the proposed framework can improve bitrate reduction, achieving an average bitrate saving of 5.85% and up to 17.15% at the same perceptual quality which is measured using Viewport Peak Signal-To-Noise Ratio (VPSNR).
first_indexed	2024-04-13T11:00:37Z
format	Article
id	doaj.art-020fbf15d345464e96aca5c6266a6ac4
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-13T11:00:37Z
publishDate	2022-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-020fbf15d345464e96aca5c6266a6ac42022-12-22T02:49:25ZengIEEEIEEE Access2169-35362022-01-011011838011839610.1109/ACCESS.2022.32198619940241Deep Learning and Bidirectional Optical Flow Based Viewport Predictions for 360° Video CodingJayasingam Adhuran0https://orcid.org/0000-0002-1477-4395Gosala Kulupana1Anil Fernando2Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, U.KCentre for Vision, Speech and Signal Processing, University of Surrey, Guildford, U.KDepartment of Computer and Information Science, University of Strathclyde, Glasgow, U.KThe rapid development of virtual reality applications continues to urge better compression of 360° videos owing to the large volume of content. These videos are typically converted to 2-D formats using various projection techniques in order to benefit from ad-hoc coding tools designed to support conventional 2-D video compression. Although recently emerged video coding standard, Versatile Video Coding (VVC) introduces 360° video specific coding tools, it fails to prioritize the user observed regions in 360° videos, represented by the rectilinear images called the viewports. This leads to the encoding of redundant regions in the video frames, escalating the bit rate cost of the videos. In response to this issue, this paper proposes a novel 360° video coding framework for VVC which exploits user observed viewport information to alleviate pixel redundancy in 360° videos. In this regard, bidirectional optical flow, Gaussian filter and Spherical Convolutional Neural Networks (Spherical CNN) are deployed to extract perceptual features and predict user observed viewports. By appropriately fusing the predicted viewports on the 2-D projected 360° video frames, a novel Regions of Interest (ROI) aware weightmap is developed which can be used to mask the source video and introduce adaptive changes to the Lagrange and quantization parameters in VVC. Comprehensive experiments conducted in the context of VVC Test Model (VTM) 7.0 show that the proposed framework can improve bitrate reduction, achieving an average bitrate saving of 5.85% and up to 17.15% at the same perceptual quality which is measured using Viewport Peak Signal-To-Noise Ratio (VPSNR).https://ieeexplore.ieee.org/document/9940241/360° videoperceptual codingRegions of Interestviewport predictionVersatile Video Coding
spellingShingle	Jayasingam Adhuran Gosala Kulupana Anil Fernando Deep Learning and Bidirectional Optical Flow Based Viewport Predictions for 360° Video Coding IEEE Access 360° video perceptual coding Regions of Interest viewport prediction Versatile Video Coding
title	Deep Learning and Bidirectional Optical Flow Based Viewport Predictions for 360° Video Coding
title_full	Deep Learning and Bidirectional Optical Flow Based Viewport Predictions for 360° Video Coding
title_fullStr	Deep Learning and Bidirectional Optical Flow Based Viewport Predictions for 360° Video Coding
title_full_unstemmed	Deep Learning and Bidirectional Optical Flow Based Viewport Predictions for 360° Video Coding
title_short	Deep Learning and Bidirectional Optical Flow Based Viewport Predictions for 360° Video Coding
title_sort	deep learning and bidirectional optical flow based viewport predictions for 360 x00b0 video coding
topic	360° video perceptual coding Regions of Interest viewport prediction Versatile Video Coding
url	https://ieeexplore.ieee.org/document/9940241/
work_keys_str_mv	AT jayasingamadhuran deeplearningandbidirectionalopticalflowbasedviewportpredictionsfor360x00b0videocoding AT gosalakulupana deeplearningandbidirectionalopticalflowbasedviewportpredictionsfor360x00b0videocoding AT anilfernando deeplearningandbidirectionalopticalflowbasedviewportpredictionsfor360x00b0videocoding

Deep Learning and Bidirectional Optical Flow Based Viewport Predictions for 360&#x00B0; Video Coding

Similar Items

Deep Learning and Bidirectional Optical Flow Based Viewport Predictions for 360° Video Coding