Local Multi-Head Channel Self-Attention for Facial Expression Recognition

Since the Transformer architecture was introduced in 2017, there has been many attempts to bring the <i>self-attention</i> paradigm in the field of computer vision. In this paper, we propose <i>LHC</i>: Local multi-Head Channel <i>self-attention</i>, a novel <i...

Full description

Bibliographic Details
Main Authors: Roberto Pecoraro, Valerio Basile, Viviana Bono
Format: Article
Language:English
Published: MDPI AG 2022-09-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/13/9/419
Description
Summary:Since the Transformer architecture was introduced in 2017, there has been many attempts to bring the <i>self-attention</i> paradigm in the field of computer vision. In this paper, we propose <i>LHC</i>: Local multi-Head Channel <i>self-attention</i>, a novel <i>self-attention</i> module that can be easily integrated into virtually every convolutional neural network, and that is specifically designed for computer vision, with a specific focus on facial expression recognition. <i>LHC</i> is based on two main ideas: first, we think that in computer vision, the best way to leverage the <i>self-attention</i> paradigm is the channel-wise application instead of the more well explored spatial attention. Secondly, a local approach has the potential to better overcome the limitations of convolution than global attention, at least in those scenarios where images have a constant general structure, as in facial expression recognition. <i>LHC-Net</i> achieves a new state-of-the-art in the FER2013 dataset, with a significantly lower complexity and impact on the “host” architecture in terms of computational cost when compared with the previous state-of-the-art.
ISSN:2078-2489