Multi-Frame Pyramid Refinement Network for Video Frame Interpolation
Video frame interpolation aims at synthesizing new video frames in-between existing frames to generate higher frame rate video. Current methods usually use two adjacent frames to generate intermediate frames, but sometimes fail to handle challenges like large motion, occlusion, and motion blur. This...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8832250/ |
_version_ | 1818603313364467712 |
---|---|
author | Haoxian Zhang Ronggang Wang Yang Zhao |
author_facet | Haoxian Zhang Ronggang Wang Yang Zhao |
author_sort | Haoxian Zhang |
collection | DOAJ |
description | Video frame interpolation aims at synthesizing new video frames in-between existing frames to generate higher frame rate video. Current methods usually use two adjacent frames to generate intermediate frames, but sometimes fail to handle challenges like large motion, occlusion, and motion blur. This paper proposes a multi-frame pyramid refinement network to effectively use spatio-temporal information contained in multiple frames (more than two). There are three technical contributions in the proposed network. First, a special coarse-to-fine framework is proposed to refine optical flows in-between multiple frames with residual flows at each pyramid level. Therefore, large motion and occlusion can be effectively estimated. Second, a 3D U-net feature extractor is used to excavate spatio-temporal context and restore texture, which tend to disappear at course pyramid levels. Third, a multi-step perceptual loss is adopted to preserve more details in intermediate frame. It is worth mentioning that our approach can be easily extended to multi-frame interpolation. Our network is trained end-to-end using more than 80K collected frame groups (25 frames per group). Experimental results on several independent datasets show that our approach can effectively handle challenging cases, and perform consistently better than other state-of-the-art methods. |
first_indexed | 2024-12-16T13:21:11Z |
format | Article |
id | doaj.art-3434ccce93f946a886123dc93b49c522 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-16T13:21:11Z |
publishDate | 2019-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-3434ccce93f946a886123dc93b49c5222022-12-21T22:30:20ZengIEEEIEEE Access2169-35362019-01-01713061013062110.1109/ACCESS.2019.29405108832250Multi-Frame Pyramid Refinement Network for Video Frame InterpolationHaoxian Zhang0https://orcid.org/0000-0001-7078-868XRonggang Wang1Yang Zhao2School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen, ChinaSchool of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen, ChinaPeng Cheng Laboratory, Shenzhen, ChinaVideo frame interpolation aims at synthesizing new video frames in-between existing frames to generate higher frame rate video. Current methods usually use two adjacent frames to generate intermediate frames, but sometimes fail to handle challenges like large motion, occlusion, and motion blur. This paper proposes a multi-frame pyramid refinement network to effectively use spatio-temporal information contained in multiple frames (more than two). There are three technical contributions in the proposed network. First, a special coarse-to-fine framework is proposed to refine optical flows in-between multiple frames with residual flows at each pyramid level. Therefore, large motion and occlusion can be effectively estimated. Second, a 3D U-net feature extractor is used to excavate spatio-temporal context and restore texture, which tend to disappear at course pyramid levels. Third, a multi-step perceptual loss is adopted to preserve more details in intermediate frame. It is worth mentioning that our approach can be easily extended to multi-frame interpolation. Our network is trained end-to-end using more than 80K collected frame groups (25 frames per group). Experimental results on several independent datasets show that our approach can effectively handle challenging cases, and perform consistently better than other state-of-the-art methods.https://ieeexplore.ieee.org/document/8832250/Video frame interpolationmultiple framesspatio-temporal informationoptical flowcoarse-to-fine frameworkdeep learning |
spellingShingle | Haoxian Zhang Ronggang Wang Yang Zhao Multi-Frame Pyramid Refinement Network for Video Frame Interpolation IEEE Access Video frame interpolation multiple frames spatio-temporal information optical flow coarse-to-fine framework deep learning |
title | Multi-Frame Pyramid Refinement Network for Video Frame Interpolation |
title_full | Multi-Frame Pyramid Refinement Network for Video Frame Interpolation |
title_fullStr | Multi-Frame Pyramid Refinement Network for Video Frame Interpolation |
title_full_unstemmed | Multi-Frame Pyramid Refinement Network for Video Frame Interpolation |
title_short | Multi-Frame Pyramid Refinement Network for Video Frame Interpolation |
title_sort | multi frame pyramid refinement network for video frame interpolation |
topic | Video frame interpolation multiple frames spatio-temporal information optical flow coarse-to-fine framework deep learning |
url | https://ieeexplore.ieee.org/document/8832250/ |
work_keys_str_mv | AT haoxianzhang multiframepyramidrefinementnetworkforvideoframeinterpolation AT ronggangwang multiframepyramidrefinementnetworkforvideoframeinterpolation AT yangzhao multiframepyramidrefinementnetworkforvideoframeinterpolation |