Bridging global context interactions for high-fidelity image completion

Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In this paper, we prop...

Full description

Bibliographic Details
Main Authors:	Zheng, Chuanxia, Cham, Tat-Jen, Cai, Jianfei, Phung, Dinh
Other Authors:	School of Computer Science and Engineering
Format:	Conference Paper
Language:	English
Published:	2023
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Radio Frequency Convolutional Codes
Online Access:	https://hdl.handle.net/10356/172659

_version_	1826129488376758272
author	Zheng, Chuanxia Cham, Tat-Jen Cai, Jianfei Phung, Dinh
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Zheng, Chuanxia Cham, Tat-Jen Cai, Jianfei Phung, Dinh
author_sort	Zheng, Chuanxia
collection	NTU
description	Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In this paper, we propose to treat image completion as a directionless sequence-to-sequence prediction task, and deploy a transformer to directly capture long-range depen-dence. Crucially, we employ a restrictive CNN with small and non-overlapping RF for weighted token representation, which allows the transformer to explicitly model the long-range visible context relations with equal importance in all layers, without implicitly confounding neighboring tokens when larger RFs are used. To improve appearance consistency between visible and generated regions, a novel attention-aware layer (AAL) is introduced to better exploit distantly related high-frequency features. Overall, extensive experiments demonstrate superior performance compared to state-of-the-art methods on several datasets. Code is available at https://github.com/lyndonzheng/TFill.
first_indexed	2024-10-01T07:41:26Z
format	Conference Paper
id	ntu-10356/172659
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T07:41:26Z
publishDate	2023
record_format	dspace
spelling	ntu-10356/1726592023-12-19T05:00:56Z Bridging global context interactions for high-fidelity image completion Zheng, Chuanxia Cham, Tat-Jen Cai, Jianfei Phung, Dinh School of Computer Science and Engineering 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Radio Frequency Convolutional Codes Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In this paper, we propose to treat image completion as a directionless sequence-to-sequence prediction task, and deploy a transformer to directly capture long-range depen-dence. Crucially, we employ a restrictive CNN with small and non-overlapping RF for weighted token representation, which allows the transformer to explicitly model the long-range visible context relations with equal importance in all layers, without implicitly confounding neighboring tokens when larger RFs are used. To improve appearance consistency between visible and generated regions, a novel attention-aware layer (AAL) is introduced to better exploit distantly related high-frequency features. Overall, extensive experiments demonstrate superior performance compared to state-of-the-art methods on several datasets. Code is available at https://github.com/lyndonzheng/TFill. This research was supported by Monash FIT Grant. This study was also supported under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from Singapore Telecommunications Limited (Singtel), through Singtel Cognitive and Artificial Intelligence Lab for Enterprises (SCALE@NTU). 2023-12-19T05:00:56Z 2023-12-19T05:00:56Z 2022 Conference Paper Zheng, C., Cham, T., Cai, J. & Phung, D. (2022). Bridging global context interactions for high-fidelity image completion. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11502-11512. https://dx.doi.org/10.1109/CVPR52688.2022.01122 9781665469463 https://hdl.handle.net/10356/172659 10.1109/CVPR52688.2022.01122 2-s2.0-85136091993 11502 11512 en IAF-ICP © 2022 IEEE. All rights reserved.
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Radio Frequency Convolutional Codes Zheng, Chuanxia Cham, Tat-Jen Cai, Jianfei Phung, Dinh Bridging global context interactions for high-fidelity image completion
title	Bridging global context interactions for high-fidelity image completion
title_full	Bridging global context interactions for high-fidelity image completion
title_fullStr	Bridging global context interactions for high-fidelity image completion
title_full_unstemmed	Bridging global context interactions for high-fidelity image completion
title_short	Bridging global context interactions for high-fidelity image completion
title_sort	bridging global context interactions for high fidelity image completion
topic	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Radio Frequency Convolutional Codes
url	https://hdl.handle.net/10356/172659
work_keys_str_mv	AT zhengchuanxia bridgingglobalcontextinteractionsforhighfidelityimagecompletion AT chamtatjen bridgingglobalcontextinteractionsforhighfidelityimagecompletion AT caijianfei bridgingglobalcontextinteractionsforhighfidelityimagecompletion AT phungdinh bridgingglobalcontextinteractionsforhighfidelityimagecompletion

Bridging global context interactions for high-fidelity image completion

Similar Items