Summary: | A transformer architecture achieves outstanding performance in computer vision tasks based on the ability to capture long-range dependencies. However, a quadratic increase in complexity with respect to spatial resolution makes it impractical to apply for image restoration tasks. In this paper, we propose a Decomformer that efficiently captures global relationship by decomposing self-attention into linear combination of vectors and coefficients to reduce the heavy computational cost. This approximation not only reduces the complexity linearly, but also preserves the globality of the vanilla self-attention properly. Moreover, we apply a linear simple gate to represent the complex self-attention mechanism as the proposed decomposition directly. To show the effectiveness of our approach, we apply it to image restoration tasks including denoising, deblurring and deraining. The proposed decomposing scheme for self-attention in the Transformer achieves better or comparable results with state-of-the-arts as well as much more efficiency than most of previous approaches.
|