Multi-Frame Pyramid Refinement Network for Video Frame Interpolation

Video frame interpolation aims at synthesizing new video frames in-between existing frames to generate higher frame rate video. Current methods usually use two adjacent frames to generate intermediate frames, but sometimes fail to handle challenges like large motion, occlusion, and motion blur. This...

Full description

Bibliographic Details
Main Authors: Haoxian Zhang, Ronggang Wang, Yang Zhao
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8832250/
_version_ 1818603313364467712
author Haoxian Zhang
Ronggang Wang
Yang Zhao
author_facet Haoxian Zhang
Ronggang Wang
Yang Zhao
author_sort Haoxian Zhang
collection DOAJ
description Video frame interpolation aims at synthesizing new video frames in-between existing frames to generate higher frame rate video. Current methods usually use two adjacent frames to generate intermediate frames, but sometimes fail to handle challenges like large motion, occlusion, and motion blur. This paper proposes a multi-frame pyramid refinement network to effectively use spatio-temporal information contained in multiple frames (more than two). There are three technical contributions in the proposed network. First, a special coarse-to-fine framework is proposed to refine optical flows in-between multiple frames with residual flows at each pyramid level. Therefore, large motion and occlusion can be effectively estimated. Second, a 3D U-net feature extractor is used to excavate spatio-temporal context and restore texture, which tend to disappear at course pyramid levels. Third, a multi-step perceptual loss is adopted to preserve more details in intermediate frame. It is worth mentioning that our approach can be easily extended to multi-frame interpolation. Our network is trained end-to-end using more than 80K collected frame groups (25 frames per group). Experimental results on several independent datasets show that our approach can effectively handle challenging cases, and perform consistently better than other state-of-the-art methods.
first_indexed 2024-12-16T13:21:11Z
format Article
id doaj.art-3434ccce93f946a886123dc93b49c522
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-16T13:21:11Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-3434ccce93f946a886123dc93b49c5222022-12-21T22:30:20ZengIEEEIEEE Access2169-35362019-01-01713061013062110.1109/ACCESS.2019.29405108832250Multi-Frame Pyramid Refinement Network for Video Frame InterpolationHaoxian Zhang0https://orcid.org/0000-0001-7078-868XRonggang Wang1Yang Zhao2School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen, ChinaSchool of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen, ChinaPeng Cheng Laboratory, Shenzhen, ChinaVideo frame interpolation aims at synthesizing new video frames in-between existing frames to generate higher frame rate video. Current methods usually use two adjacent frames to generate intermediate frames, but sometimes fail to handle challenges like large motion, occlusion, and motion blur. This paper proposes a multi-frame pyramid refinement network to effectively use spatio-temporal information contained in multiple frames (more than two). There are three technical contributions in the proposed network. First, a special coarse-to-fine framework is proposed to refine optical flows in-between multiple frames with residual flows at each pyramid level. Therefore, large motion and occlusion can be effectively estimated. Second, a 3D U-net feature extractor is used to excavate spatio-temporal context and restore texture, which tend to disappear at course pyramid levels. Third, a multi-step perceptual loss is adopted to preserve more details in intermediate frame. It is worth mentioning that our approach can be easily extended to multi-frame interpolation. Our network is trained end-to-end using more than 80K collected frame groups (25 frames per group). Experimental results on several independent datasets show that our approach can effectively handle challenging cases, and perform consistently better than other state-of-the-art methods.https://ieeexplore.ieee.org/document/8832250/Video frame interpolationmultiple framesspatio-temporal informationoptical flowcoarse-to-fine frameworkdeep learning
spellingShingle Haoxian Zhang
Ronggang Wang
Yang Zhao
Multi-Frame Pyramid Refinement Network for Video Frame Interpolation
IEEE Access
Video frame interpolation
multiple frames
spatio-temporal information
optical flow
coarse-to-fine framework
deep learning
title Multi-Frame Pyramid Refinement Network for Video Frame Interpolation
title_full Multi-Frame Pyramid Refinement Network for Video Frame Interpolation
title_fullStr Multi-Frame Pyramid Refinement Network for Video Frame Interpolation
title_full_unstemmed Multi-Frame Pyramid Refinement Network for Video Frame Interpolation
title_short Multi-Frame Pyramid Refinement Network for Video Frame Interpolation
title_sort multi frame pyramid refinement network for video frame interpolation
topic Video frame interpolation
multiple frames
spatio-temporal information
optical flow
coarse-to-fine framework
deep learning
url https://ieeexplore.ieee.org/document/8832250/
work_keys_str_mv AT haoxianzhang multiframepyramidrefinementnetworkforvideoframeinterpolation
AT ronggangwang multiframepyramidrefinementnetworkforvideoframeinterpolation
AT yangzhao multiframepyramidrefinementnetworkforvideoframeinterpolation