Attention-Based Bi-Prediction Network for Versatile Video Coding (VVC) over 5G Network

As the demands of various network-dependent services such as Internet of things (IoT) applications, autonomous driving, and augmented and virtual reality (AR/VR) increase, the fifthgeneration (5G) network is expected to become a key communication technology. The latest video coding standard, versati...

Full description

Bibliographic Details
Main Authors: Young-Ju Choi, Young-Woon Lee, Jongho Kim, Se Yoon Jeong, Jin Soo Choi, Byung-Gyu Kim
Format: Article
Language:English
Published: MDPI AG 2023-02-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/23/5/2631
_version_ 1797614309971329024
author Young-Ju Choi
Young-Woon Lee
Jongho Kim
Se Yoon Jeong
Jin Soo Choi
Byung-Gyu Kim
author_facet Young-Ju Choi
Young-Woon Lee
Jongho Kim
Se Yoon Jeong
Jin Soo Choi
Byung-Gyu Kim
author_sort Young-Ju Choi
collection DOAJ
description As the demands of various network-dependent services such as Internet of things (IoT) applications, autonomous driving, and augmented and virtual reality (AR/VR) increase, the fifthgeneration (5G) network is expected to become a key communication technology. The latest video coding standard, versatile video coding (VVC), can contribute to providing high-quality services by achieving superior compression performance. In video coding, inter bi-prediction serves to improve the coding efficiency significantly by producing a precise fused prediction block. Although block-wise methods, such as bi-prediction with CU-level weight (BCW), are applied in VVC, it is still difficult for the linear fusion-based strategy to represent diverse pixel variations inside a block. In addition, a pixel-wise method called bi-directional optical flow (BDOF) has been proposed to refine bi-prediction block. However, the non-linear optical flow equation in BDOF mode is applied under assumptions, so this method is still unable to accurately compensate various kinds of bi-prediction blocks. In this paper, we propose an attention-based bi-prediction network (ABPN) to substitute for the whole existing bi-prediction methods. The proposed ABPN is designed to learn efficient representations of the fused features by utilizing an attention mechanism. Furthermore, the knowledge distillation (KD)- based approach is employed to compress the size of the proposed network while keeping comparable output as the large model. The proposed ABPN is integrated into the VTM-11.0 NNVC-1.0 standard reference software. When compared with VTM anchor, it is verified that the BD-rate reduction of the lightweighted ABPN can be up to 5.89% and 4.91% on Y component under random access (RA) and low delay B (LDB), respectively.
first_indexed 2024-03-11T07:10:34Z
format Article
id doaj.art-8b331e063e494e9883eefa52f3efab7f
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-11T07:10:34Z
publishDate 2023-02-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-8b331e063e494e9883eefa52f3efab7f2023-11-17T08:37:31ZengMDPI AGSensors1424-82202023-02-01235263110.3390/s23052631Attention-Based Bi-Prediction Network for Versatile Video Coding (VVC) over 5G NetworkYoung-Ju Choi0Young-Woon Lee1Jongho Kim2Se Yoon Jeong3Jin Soo Choi4Byung-Gyu Kim5Department of IT Engineering, Sookmyung Women’s University, Seoul 04310, Republic of KoreaDepartment of Computer Engineering, Sunmoon University, Asan 31460, Republic of KoreaMedia Coding Research Section, Electronics and Telecommunications Research Institute, Daejeon 34129, Republic of KoreaMedia Coding Research Section, Electronics and Telecommunications Research Institute, Daejeon 34129, Republic of KoreaMedia Coding Research Section, Electronics and Telecommunications Research Institute, Daejeon 34129, Republic of KoreaDepartment of IT Engineering, Sookmyung Women’s University, Seoul 04310, Republic of KoreaAs the demands of various network-dependent services such as Internet of things (IoT) applications, autonomous driving, and augmented and virtual reality (AR/VR) increase, the fifthgeneration (5G) network is expected to become a key communication technology. The latest video coding standard, versatile video coding (VVC), can contribute to providing high-quality services by achieving superior compression performance. In video coding, inter bi-prediction serves to improve the coding efficiency significantly by producing a precise fused prediction block. Although block-wise methods, such as bi-prediction with CU-level weight (BCW), are applied in VVC, it is still difficult for the linear fusion-based strategy to represent diverse pixel variations inside a block. In addition, a pixel-wise method called bi-directional optical flow (BDOF) has been proposed to refine bi-prediction block. However, the non-linear optical flow equation in BDOF mode is applied under assumptions, so this method is still unable to accurately compensate various kinds of bi-prediction blocks. In this paper, we propose an attention-based bi-prediction network (ABPN) to substitute for the whole existing bi-prediction methods. The proposed ABPN is designed to learn efficient representations of the fused features by utilizing an attention mechanism. Furthermore, the knowledge distillation (KD)- based approach is employed to compress the size of the proposed network while keeping comparable output as the large model. The proposed ABPN is integrated into the VTM-11.0 NNVC-1.0 standard reference software. When compared with VTM anchor, it is verified that the BD-rate reduction of the lightweighted ABPN can be up to 5.89% and 4.91% on Y component under random access (RA) and low delay B (LDB), respectively.https://www.mdpi.com/1424-8220/23/5/26315Gversatile video codingattention mechanismbi-predictionconvolutional neural network
spellingShingle Young-Ju Choi
Young-Woon Lee
Jongho Kim
Se Yoon Jeong
Jin Soo Choi
Byung-Gyu Kim
Attention-Based Bi-Prediction Network for Versatile Video Coding (VVC) over 5G Network
Sensors
5G
versatile video coding
attention mechanism
bi-prediction
convolutional neural network
title Attention-Based Bi-Prediction Network for Versatile Video Coding (VVC) over 5G Network
title_full Attention-Based Bi-Prediction Network for Versatile Video Coding (VVC) over 5G Network
title_fullStr Attention-Based Bi-Prediction Network for Versatile Video Coding (VVC) over 5G Network
title_full_unstemmed Attention-Based Bi-Prediction Network for Versatile Video Coding (VVC) over 5G Network
title_short Attention-Based Bi-Prediction Network for Versatile Video Coding (VVC) over 5G Network
title_sort attention based bi prediction network for versatile video coding vvc over 5g network
topic 5G
versatile video coding
attention mechanism
bi-prediction
convolutional neural network
url https://www.mdpi.com/1424-8220/23/5/2631
work_keys_str_mv AT youngjuchoi attentionbasedbipredictionnetworkforversatilevideocodingvvcover5gnetwork
AT youngwoonlee attentionbasedbipredictionnetworkforversatilevideocodingvvcover5gnetwork
AT jonghokim attentionbasedbipredictionnetworkforversatilevideocodingvvcover5gnetwork
AT seyoonjeong attentionbasedbipredictionnetworkforversatilevideocodingvvcover5gnetwork
AT jinsoochoi attentionbasedbipredictionnetworkforversatilevideocodingvvcover5gnetwork
AT byunggyukim attentionbasedbipredictionnetworkforversatilevideocodingvvcover5gnetwork