Summary: | Abstract Perfect image compositing can harmonize the appearance between the foreground and background effectively so that the composite result looks seamless and natural. However, the traditional convolutional neural network (CNN)-based methods often fail to yield highly realistic composite results due to overdependence on scene parsing while ignoring the coherence of semantic and structural between foreground and background. In this paper, we propose a framework to solve this problem by training a stacked generative adversarial network with attention guidance, which can efficiently create a high-resolution, realistic-looking composite. To this end, we develop a diverse adversarial loss in addition to perceptual and guidance loss to train the proposed generative network. Moreover, we construct a multi-scenario dataset for high-resolution image compositing, which contains high-quality images with different styles and object masks. Experiments on the synthesized and real images demonstrate the efficiency and effectiveness of our network in producing seamless, natural, and realistic results. Ablation studies show that our proposed network can improve the visual performance of composite results compared with the application of existing methods.
|