Bridging global context interactions for high-fidelity image completion

Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In this paper, we prop...

Full description

Bibliographic Details
Main Authors: Zheng, Chuanxia, Cham, Tat-Jen, Cai, Jianfei, Phung, Dinh
Other Authors: School of Computer Science and Engineering
Format: Conference Paper
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/172659
_version_ 1826129488376758272
author Zheng, Chuanxia
Cham, Tat-Jen
Cai, Jianfei
Phung, Dinh
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Zheng, Chuanxia
Cham, Tat-Jen
Cai, Jianfei
Phung, Dinh
author_sort Zheng, Chuanxia
collection NTU
description Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In this paper, we propose to treat image completion as a directionless sequence-to-sequence prediction task, and deploy a transformer to directly capture long-range depen-dence. Crucially, we employ a restrictive CNN with small and non-overlapping RF for weighted token representation, which allows the transformer to explicitly model the long-range visible context relations with equal importance in all layers, without implicitly confounding neighboring tokens when larger RFs are used. To improve appearance consistency between visible and generated regions, a novel attention-aware layer (AAL) is introduced to better exploit distantly related high-frequency features. Overall, extensive experiments demonstrate superior performance compared to state-of-the-art methods on several datasets. Code is available at https://github.com/lyndonzheng/TFill.
first_indexed 2024-10-01T07:41:26Z
format Conference Paper
id ntu-10356/172659
institution Nanyang Technological University
language English
last_indexed 2024-10-01T07:41:26Z
publishDate 2023
record_format dspace
spelling ntu-10356/1726592023-12-19T05:00:56Z Bridging global context interactions for high-fidelity image completion Zheng, Chuanxia Cham, Tat-Jen Cai, Jianfei Phung, Dinh School of Computer Science and Engineering 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Radio Frequency Convolutional Codes Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In this paper, we propose to treat image completion as a directionless sequence-to-sequence prediction task, and deploy a transformer to directly capture long-range depen-dence. Crucially, we employ a restrictive CNN with small and non-overlapping RF for weighted token representation, which allows the transformer to explicitly model the long-range visible context relations with equal importance in all layers, without implicitly confounding neighboring tokens when larger RFs are used. To improve appearance consistency between visible and generated regions, a novel attention-aware layer (AAL) is introduced to better exploit distantly related high-frequency features. Overall, extensive experiments demonstrate superior performance compared to state-of-the-art methods on several datasets. Code is available at https://github.com/lyndonzheng/TFill. This research was supported by Monash FIT Grant. This study was also supported under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from Singapore Telecommunications Limited (Singtel), through Singtel Cognitive and Artificial Intelligence Lab for Enterprises (SCALE@NTU). 2023-12-19T05:00:56Z 2023-12-19T05:00:56Z 2022 Conference Paper Zheng, C., Cham, T., Cai, J. & Phung, D. (2022). Bridging global context interactions for high-fidelity image completion. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11502-11512. https://dx.doi.org/10.1109/CVPR52688.2022.01122 9781665469463 https://hdl.handle.net/10356/172659 10.1109/CVPR52688.2022.01122 2-s2.0-85136091993 11502 11512 en IAF-ICP © 2022 IEEE. All rights reserved.
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Radio Frequency
Convolutional Codes
Zheng, Chuanxia
Cham, Tat-Jen
Cai, Jianfei
Phung, Dinh
Bridging global context interactions for high-fidelity image completion
title Bridging global context interactions for high-fidelity image completion
title_full Bridging global context interactions for high-fidelity image completion
title_fullStr Bridging global context interactions for high-fidelity image completion
title_full_unstemmed Bridging global context interactions for high-fidelity image completion
title_short Bridging global context interactions for high-fidelity image completion
title_sort bridging global context interactions for high fidelity image completion
topic Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Radio Frequency
Convolutional Codes
url https://hdl.handle.net/10356/172659
work_keys_str_mv AT zhengchuanxia bridgingglobalcontextinteractionsforhighfidelityimagecompletion
AT chamtatjen bridgingglobalcontextinteractionsforhighfidelityimagecompletion
AT caijianfei bridgingglobalcontextinteractionsforhighfidelityimagecompletion
AT phungdinh bridgingglobalcontextinteractionsforhighfidelityimagecompletion