TT-MLP: Tensor Train Decomposition on Deep MLPs

Deep multilayer perceptrons (MLPs) have achieved promising performance on computer vision tasks. Deep MLPs consist solely of fully-connected layers as the conventional MLPs do but adopt more sophisticated network architectures based on mixer layers composed of token-mixing and channel-mixing compone...

Full description

Bibliographic Details
Main Authors:	Jiale Yan, Kota Ando, Jaehoon Yu, Masato Motomura
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Tensor-train decomposition low-rank approximation deep neural networks deep multilayer perceptron network parameter compression
Online Access:	https://ieeexplore.ieee.org/document/10032168/

_version_	1811172251732541440
author	Jiale Yan Kota Ando Jaehoon Yu Masato Motomura
author_facet	Jiale Yan Kota Ando Jaehoon Yu Masato Motomura
author_sort	Jiale Yan
collection	DOAJ
description	Deep multilayer perceptrons (MLPs) have achieved promising performance on computer vision tasks. Deep MLPs consist solely of fully-connected layers as the conventional MLPs do but adopt more sophisticated network architectures based on mixer layers composed of token-mixing and channel-mixing components. These architectures enable deep MLPs to have global receptive fields, but the significant increase of parameters becomes a massive burden on practical applications. To tackle this problem, we focus on using tensor-train decomposition (TTD) for compressing deep MLPs. At first, this paper analyzes deep MLPs under conventional TTD methods, especially using various designs of a macro framework and micro blocks: The former is how to concatenate mixer layers, and the latter is how to design a mixer layer. Based on the analysis, we propose a novel TTD method named Train-TTD-Train. The proposed method exerts the learning capability of channel-mixing components and improves the trade-off between accuracy and size. In the evaluation, the proposed method showed a better trade-off than conventional TTD methods on ImageNet-1K and achieved a 0.56% higher inference accuracy with a 15.44% memory reduction on Cifar-10.
first_indexed	2024-04-10T17:26:47Z
format	Article
id	doaj.art-30ba875946fe41068da16dd49b39bfc3
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-10T17:26:47Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-30ba875946fe41068da16dd49b39bfc32023-02-04T00:00:18ZengIEEEIEEE Access2169-35362023-01-0111103981041110.1109/ACCESS.2023.324078410032168TT-MLP: Tensor Train Decomposition on Deep MLPsJiale Yan0https://orcid.org/0000-0003-4972-6315Kota Ando1https://orcid.org/0000-0001-8648-3768Jaehoon Yu2https://orcid.org/0000-0001-6639-7694Masato Motomura3https://orcid.org/0000-0003-1543-1252Tokyo Institute of Technology, Yokohama, JapanFaculty of Information Science and Technology, Hokkaido University, Sapporo, JapanTokyo Institute of Technology, Yokohama, JapanTokyo Institute of Technology, Yokohama, JapanDeep multilayer perceptrons (MLPs) have achieved promising performance on computer vision tasks. Deep MLPs consist solely of fully-connected layers as the conventional MLPs do but adopt more sophisticated network architectures based on mixer layers composed of token-mixing and channel-mixing components. These architectures enable deep MLPs to have global receptive fields, but the significant increase of parameters becomes a massive burden on practical applications. To tackle this problem, we focus on using tensor-train decomposition (TTD) for compressing deep MLPs. At first, this paper analyzes deep MLPs under conventional TTD methods, especially using various designs of a macro framework and micro blocks: The former is how to concatenate mixer layers, and the latter is how to design a mixer layer. Based on the analysis, we propose a novel TTD method named Train-TTD-Train. The proposed method exerts the learning capability of channel-mixing components and improves the trade-off between accuracy and size. In the evaluation, the proposed method showed a better trade-off than conventional TTD methods on ImageNet-1K and achieved a 0.56% higher inference accuracy with a 15.44% memory reduction on Cifar-10.https://ieeexplore.ieee.org/document/10032168/Tensor-train decompositionlow-rank approximationdeep neural networksdeep multilayer perceptronnetwork parameter compression
spellingShingle	Jiale Yan Kota Ando Jaehoon Yu Masato Motomura TT-MLP: Tensor Train Decomposition on Deep MLPs IEEE Access Tensor-train decomposition low-rank approximation deep neural networks deep multilayer perceptron network parameter compression
title	TT-MLP: Tensor Train Decomposition on Deep MLPs
title_full	TT-MLP: Tensor Train Decomposition on Deep MLPs
title_fullStr	TT-MLP: Tensor Train Decomposition on Deep MLPs
title_full_unstemmed	TT-MLP: Tensor Train Decomposition on Deep MLPs
title_short	TT-MLP: Tensor Train Decomposition on Deep MLPs
title_sort	tt mlp tensor train decomposition on deep mlps
topic	Tensor-train decomposition low-rank approximation deep neural networks deep multilayer perceptron network parameter compression
url	https://ieeexplore.ieee.org/document/10032168/
work_keys_str_mv	AT jialeyan ttmlptensortraindecompositionondeepmlps AT kotaando ttmlptensortraindecompositionondeepmlps AT jaehoonyu ttmlptensortraindecompositionondeepmlps AT masatomotomura ttmlptensortraindecompositionondeepmlps

TT-MLP: Tensor Train Decomposition on Deep MLPs

Similar Items