Attention-Based Bi-Prediction Network for Versatile Video Coding (VVC) over 5G Network
As the demands of various network-dependent services such as Internet of things (IoT) applications, autonomous driving, and augmented and virtual reality (AR/VR) increase, the fifthgeneration (5G) network is expected to become a key communication technology. The latest video coding standard, versati...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-02-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/23/5/2631 |
_version_ | 1797614309971329024 |
---|---|
author | Young-Ju Choi Young-Woon Lee Jongho Kim Se Yoon Jeong Jin Soo Choi Byung-Gyu Kim |
author_facet | Young-Ju Choi Young-Woon Lee Jongho Kim Se Yoon Jeong Jin Soo Choi Byung-Gyu Kim |
author_sort | Young-Ju Choi |
collection | DOAJ |
description | As the demands of various network-dependent services such as Internet of things (IoT) applications, autonomous driving, and augmented and virtual reality (AR/VR) increase, the fifthgeneration (5G) network is expected to become a key communication technology. The latest video coding standard, versatile video coding (VVC), can contribute to providing high-quality services by achieving superior compression performance. In video coding, inter bi-prediction serves to improve the coding efficiency significantly by producing a precise fused prediction block. Although block-wise methods, such as bi-prediction with CU-level weight (BCW), are applied in VVC, it is still difficult for the linear fusion-based strategy to represent diverse pixel variations inside a block. In addition, a pixel-wise method called bi-directional optical flow (BDOF) has been proposed to refine bi-prediction block. However, the non-linear optical flow equation in BDOF mode is applied under assumptions, so this method is still unable to accurately compensate various kinds of bi-prediction blocks. In this paper, we propose an attention-based bi-prediction network (ABPN) to substitute for the whole existing bi-prediction methods. The proposed ABPN is designed to learn efficient representations of the fused features by utilizing an attention mechanism. Furthermore, the knowledge distillation (KD)- based approach is employed to compress the size of the proposed network while keeping comparable output as the large model. The proposed ABPN is integrated into the VTM-11.0 NNVC-1.0 standard reference software. When compared with VTM anchor, it is verified that the BD-rate reduction of the lightweighted ABPN can be up to 5.89% and 4.91% on Y component under random access (RA) and low delay B (LDB), respectively. |
first_indexed | 2024-03-11T07:10:34Z |
format | Article |
id | doaj.art-8b331e063e494e9883eefa52f3efab7f |
institution | Directory Open Access Journal |
issn | 1424-8220 |
language | English |
last_indexed | 2024-03-11T07:10:34Z |
publishDate | 2023-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj.art-8b331e063e494e9883eefa52f3efab7f2023-11-17T08:37:31ZengMDPI AGSensors1424-82202023-02-01235263110.3390/s23052631Attention-Based Bi-Prediction Network for Versatile Video Coding (VVC) over 5G NetworkYoung-Ju Choi0Young-Woon Lee1Jongho Kim2Se Yoon Jeong3Jin Soo Choi4Byung-Gyu Kim5Department of IT Engineering, Sookmyung Women’s University, Seoul 04310, Republic of KoreaDepartment of Computer Engineering, Sunmoon University, Asan 31460, Republic of KoreaMedia Coding Research Section, Electronics and Telecommunications Research Institute, Daejeon 34129, Republic of KoreaMedia Coding Research Section, Electronics and Telecommunications Research Institute, Daejeon 34129, Republic of KoreaMedia Coding Research Section, Electronics and Telecommunications Research Institute, Daejeon 34129, Republic of KoreaDepartment of IT Engineering, Sookmyung Women’s University, Seoul 04310, Republic of KoreaAs the demands of various network-dependent services such as Internet of things (IoT) applications, autonomous driving, and augmented and virtual reality (AR/VR) increase, the fifthgeneration (5G) network is expected to become a key communication technology. The latest video coding standard, versatile video coding (VVC), can contribute to providing high-quality services by achieving superior compression performance. In video coding, inter bi-prediction serves to improve the coding efficiency significantly by producing a precise fused prediction block. Although block-wise methods, such as bi-prediction with CU-level weight (BCW), are applied in VVC, it is still difficult for the linear fusion-based strategy to represent diverse pixel variations inside a block. In addition, a pixel-wise method called bi-directional optical flow (BDOF) has been proposed to refine bi-prediction block. However, the non-linear optical flow equation in BDOF mode is applied under assumptions, so this method is still unable to accurately compensate various kinds of bi-prediction blocks. In this paper, we propose an attention-based bi-prediction network (ABPN) to substitute for the whole existing bi-prediction methods. The proposed ABPN is designed to learn efficient representations of the fused features by utilizing an attention mechanism. Furthermore, the knowledge distillation (KD)- based approach is employed to compress the size of the proposed network while keeping comparable output as the large model. The proposed ABPN is integrated into the VTM-11.0 NNVC-1.0 standard reference software. When compared with VTM anchor, it is verified that the BD-rate reduction of the lightweighted ABPN can be up to 5.89% and 4.91% on Y component under random access (RA) and low delay B (LDB), respectively.https://www.mdpi.com/1424-8220/23/5/26315Gversatile video codingattention mechanismbi-predictionconvolutional neural network |
spellingShingle | Young-Ju Choi Young-Woon Lee Jongho Kim Se Yoon Jeong Jin Soo Choi Byung-Gyu Kim Attention-Based Bi-Prediction Network for Versatile Video Coding (VVC) over 5G Network Sensors 5G versatile video coding attention mechanism bi-prediction convolutional neural network |
title | Attention-Based Bi-Prediction Network for Versatile Video Coding (VVC) over 5G Network |
title_full | Attention-Based Bi-Prediction Network for Versatile Video Coding (VVC) over 5G Network |
title_fullStr | Attention-Based Bi-Prediction Network for Versatile Video Coding (VVC) over 5G Network |
title_full_unstemmed | Attention-Based Bi-Prediction Network for Versatile Video Coding (VVC) over 5G Network |
title_short | Attention-Based Bi-Prediction Network for Versatile Video Coding (VVC) over 5G Network |
title_sort | attention based bi prediction network for versatile video coding vvc over 5g network |
topic | 5G versatile video coding attention mechanism bi-prediction convolutional neural network |
url | https://www.mdpi.com/1424-8220/23/5/2631 |
work_keys_str_mv | AT youngjuchoi attentionbasedbipredictionnetworkforversatilevideocodingvvcover5gnetwork AT youngwoonlee attentionbasedbipredictionnetworkforversatilevideocodingvvcover5gnetwork AT jonghokim attentionbasedbipredictionnetworkforversatilevideocodingvvcover5gnetwork AT seyoonjeong attentionbasedbipredictionnetworkforversatilevideocodingvvcover5gnetwork AT jinsoochoi attentionbasedbipredictionnetworkforversatilevideocodingvvcover5gnetwork AT byunggyukim attentionbasedbipredictionnetworkforversatilevideocodingvvcover5gnetwork |