Metaformer: A Transformer That Tends to Mine Metaphorical-Level Information

Since introducing the Transformer model, it has dramatically influenced various fields of machine learning. The field of time series prediction has also been significantly impacted, where Transformer family models have flourished, and many variants have been differentiated. These Transformer models...

Full description

Bibliographic Details
Main Authors:	Bo Peng, Yuanming Ding, Wei Kang
Format:	Article
Language:	English
Published:	MDPI AG 2023-05-01
Series:	Sensors
Subjects:	hierarchical attention multi-head attention graph neural networks feature diversity information interaction
Online Access:	https://www.mdpi.com/1424-8220/23/11/5093

_version_	1797596761571721216
author	Bo Peng Yuanming Ding Wei Kang
author_facet	Bo Peng Yuanming Ding Wei Kang
author_sort	Bo Peng
collection	DOAJ
description	Since introducing the Transformer model, it has dramatically influenced various fields of machine learning. The field of time series prediction has also been significantly impacted, where Transformer family models have flourished, and many variants have been differentiated. These Transformer models mainly use attention mechanisms to implement feature extraction and multi-head attention mechanisms to enhance the strength of feature extraction. However, multi-head attention is essentially a simple superposition of the same attention, so they do not guarantee that the model can capture different features. Conversely, multi-head attention mechanisms may lead to much information redundancy and computational resource waste. In order to ensure that the Transformer can capture information from multiple perspectives and increase the diversity of its captured features, this paper proposes a hierarchical attention mechanism, for the first time, to improve the shortcomings of insufficient information diversity captured by the traditional multi-head attention mechanisms and the lack of information interaction among the heads. Additionally, global feature aggregation using graph networks is used to mitigate inductive bias. Finally, we conducted experiments on four benchmark datasets, and the experimental results show that the proposed model can outperform the baseline model in several metrics.
first_indexed	2024-03-11T02:57:35Z
format	Article
id	doaj.art-4f46d0dbdd2c4b80b3e4ee7c51f617dc
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-11T02:57:35Z
publishDate	2023-05-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-4f46d0dbdd2c4b80b3e4ee7c51f617dc2023-11-18T08:32:26ZengMDPI AGSensors1424-82202023-05-012311509310.3390/s23115093Metaformer: A Transformer That Tends to Mine Metaphorical-Level InformationBo Peng0Yuanming Ding1Wei Kang2Communication and Network Laboratory, Dalian University, Dalian 116622, ChinaCommunication and Network Laboratory, Dalian University, Dalian 116622, ChinaCommunication and Network Laboratory, Dalian University, Dalian 116622, ChinaSince introducing the Transformer model, it has dramatically influenced various fields of machine learning. The field of time series prediction has also been significantly impacted, where Transformer family models have flourished, and many variants have been differentiated. These Transformer models mainly use attention mechanisms to implement feature extraction and multi-head attention mechanisms to enhance the strength of feature extraction. However, multi-head attention is essentially a simple superposition of the same attention, so they do not guarantee that the model can capture different features. Conversely, multi-head attention mechanisms may lead to much information redundancy and computational resource waste. In order to ensure that the Transformer can capture information from multiple perspectives and increase the diversity of its captured features, this paper proposes a hierarchical attention mechanism, for the first time, to improve the shortcomings of insufficient information diversity captured by the traditional multi-head attention mechanisms and the lack of information interaction among the heads. Additionally, global feature aggregation using graph networks is used to mitigate inductive bias. Finally, we conducted experiments on four benchmark datasets, and the experimental results show that the proposed model can outperform the baseline model in several metrics.https://www.mdpi.com/1424-8220/23/11/5093hierarchical attentionmulti-head attentiongraph neural networksfeature diversityinformation interaction
spellingShingle	Bo Peng Yuanming Ding Wei Kang Metaformer: A Transformer That Tends to Mine Metaphorical-Level Information Sensors hierarchical attention multi-head attention graph neural networks feature diversity information interaction
title	Metaformer: A Transformer That Tends to Mine Metaphorical-Level Information
title_full	Metaformer: A Transformer That Tends to Mine Metaphorical-Level Information
title_fullStr	Metaformer: A Transformer That Tends to Mine Metaphorical-Level Information
title_full_unstemmed	Metaformer: A Transformer That Tends to Mine Metaphorical-Level Information
title_short	Metaformer: A Transformer That Tends to Mine Metaphorical-Level Information
title_sort	metaformer a transformer that tends to mine metaphorical level information
topic	hierarchical attention multi-head attention graph neural networks feature diversity information interaction
url	https://www.mdpi.com/1424-8220/23/11/5093
work_keys_str_mv	AT bopeng metaformeratransformerthattendstominemetaphoricallevelinformation AT yuanmingding metaformeratransformerthattendstominemetaphoricallevelinformation AT weikang metaformeratransformerthattendstominemetaphoricallevelinformation

Metaformer: A Transformer That Tends to Mine Metaphorical-Level Information

Similar Items