A Mixed Visual Encoding Model Based on the Larger-Scale Receptive Field for Human Brain Activity

Research on visual encoding models for functional magnetic resonance imaging derived from deep neural networks, especially CNN (e.g., VGG16), has been developed. However, CNNs typically use smaller kernel sizes (e.g., 3 × 3) for feature extraction in visual encoding models. Although the receptive fi...

Full description

Bibliographic Details
Main Authors:	Shuxiao Ma, Linyuan Wang, Panpan Chen, Ruoxi Qin, Libin Hou, Bin Yan
Format:	Article
Language:	English
Published:	MDPI AG 2022-11-01
Series:	Brain Sciences
Subjects:	visual encoding models deep neural networks receptive field a large convolution kernel RepLKNet fMRI
Online Access:	https://www.mdpi.com/2076-3425/12/12/1633

_version_	1797461184600866816
author	Shuxiao Ma Linyuan Wang Panpan Chen Ruoxi Qin Libin Hou Bin Yan
author_facet	Shuxiao Ma Linyuan Wang Panpan Chen Ruoxi Qin Libin Hou Bin Yan
author_sort	Shuxiao Ma
collection	DOAJ
description	Research on visual encoding models for functional magnetic resonance imaging derived from deep neural networks, especially CNN (e.g., VGG16), has been developed. However, CNNs typically use smaller kernel sizes (e.g., 3 × 3) for feature extraction in visual encoding models. Although the receptive field size of CNN can be enlarged by increasing the network depth or subsampling, it is limited by the small size of the convolution kernel, leading to an insufficient receptive field size. In biological research, the size of the neuronal population receptive field of high-level visual encoding regions is usually three to four times that of low-level visual encoding regions. Thus, CNNs with a larger receptive field size align with the biological findings. The RepLKNet model directly expands the convolution kernel size to obtain a larger-scale receptive field. Therefore, this paper proposes a mixed model to replace CNN for feature extraction in visual encoding models. The proposed model mixes RepLKNet and VGG so that the mixed model has a receptive field of different sizes to extract more feature information from the image. The experimental results indicate that the mixed model achieves better encoding performance in multiple regions of the visual cortex than the traditional convolutional model. Also, a larger-scale receptive field should be considered in building visual encoding models so that the convolution network can play a more significant role in visual representations.
first_indexed	2024-03-09T17:16:46Z
format	Article
id	doaj.art-f1a202aab5f74d15b38636c8116b516b
institution	Directory Open Access Journal
issn	2076-3425
language	English
last_indexed	2024-03-09T17:16:46Z
publishDate	2022-11-01
publisher	MDPI AG
record_format	Article
series	Brain Sciences
spelling	doaj.art-f1a202aab5f74d15b38636c8116b516b2023-11-24T13:39:04ZengMDPI AGBrain Sciences2076-34252022-11-011212163310.3390/brainsci12121633A Mixed Visual Encoding Model Based on the Larger-Scale Receptive Field for Human Brain ActivityShuxiao Ma0Linyuan Wang1Panpan Chen2Ruoxi Qin3Libin Hou4Bin Yan5Henan Key Laboratory of Imaging and Intelligent Processing, PLA Strategic Support Force Information, Engineering University, Zhengzhou 450001, ChinaHenan Key Laboratory of Imaging and Intelligent Processing, PLA Strategic Support Force Information, Engineering University, Zhengzhou 450001, ChinaHenan Key Laboratory of Imaging and Intelligent Processing, PLA Strategic Support Force Information, Engineering University, Zhengzhou 450001, ChinaHenan Key Laboratory of Imaging and Intelligent Processing, PLA Strategic Support Force Information, Engineering University, Zhengzhou 450001, ChinaHenan Key Laboratory of Imaging and Intelligent Processing, PLA Strategic Support Force Information, Engineering University, Zhengzhou 450001, ChinaHenan Key Laboratory of Imaging and Intelligent Processing, PLA Strategic Support Force Information, Engineering University, Zhengzhou 450001, ChinaResearch on visual encoding models for functional magnetic resonance imaging derived from deep neural networks, especially CNN (e.g., VGG16), has been developed. However, CNNs typically use smaller kernel sizes (e.g., 3 × 3) for feature extraction in visual encoding models. Although the receptive field size of CNN can be enlarged by increasing the network depth or subsampling, it is limited by the small size of the convolution kernel, leading to an insufficient receptive field size. In biological research, the size of the neuronal population receptive field of high-level visual encoding regions is usually three to four times that of low-level visual encoding regions. Thus, CNNs with a larger receptive field size align with the biological findings. The RepLKNet model directly expands the convolution kernel size to obtain a larger-scale receptive field. Therefore, this paper proposes a mixed model to replace CNN for feature extraction in visual encoding models. The proposed model mixes RepLKNet and VGG so that the mixed model has a receptive field of different sizes to extract more feature information from the image. The experimental results indicate that the mixed model achieves better encoding performance in multiple regions of the visual cortex than the traditional convolutional model. Also, a larger-scale receptive field should be considered in building visual encoding models so that the convolution network can play a more significant role in visual representations.https://www.mdpi.com/2076-3425/12/12/1633visual encoding modelsdeep neural networksreceptive fielda large convolution kernelRepLKNetfMRI
spellingShingle	Shuxiao Ma Linyuan Wang Panpan Chen Ruoxi Qin Libin Hou Bin Yan A Mixed Visual Encoding Model Based on the Larger-Scale Receptive Field for Human Brain Activity Brain Sciences visual encoding models deep neural networks receptive field a large convolution kernel RepLKNet fMRI
title	A Mixed Visual Encoding Model Based on the Larger-Scale Receptive Field for Human Brain Activity
title_full	A Mixed Visual Encoding Model Based on the Larger-Scale Receptive Field for Human Brain Activity
title_fullStr	A Mixed Visual Encoding Model Based on the Larger-Scale Receptive Field for Human Brain Activity
title_full_unstemmed	A Mixed Visual Encoding Model Based on the Larger-Scale Receptive Field for Human Brain Activity
title_short	A Mixed Visual Encoding Model Based on the Larger-Scale Receptive Field for Human Brain Activity
title_sort	mixed visual encoding model based on the larger scale receptive field for human brain activity
topic	visual encoding models deep neural networks receptive field a large convolution kernel RepLKNet fMRI
url	https://www.mdpi.com/2076-3425/12/12/1633
work_keys_str_mv	AT shuxiaoma amixedvisualencodingmodelbasedonthelargerscalereceptivefieldforhumanbrainactivity AT linyuanwang amixedvisualencodingmodelbasedonthelargerscalereceptivefieldforhumanbrainactivity AT panpanchen amixedvisualencodingmodelbasedonthelargerscalereceptivefieldforhumanbrainactivity AT ruoxiqin amixedvisualencodingmodelbasedonthelargerscalereceptivefieldforhumanbrainactivity AT libinhou amixedvisualencodingmodelbasedonthelargerscalereceptivefieldforhumanbrainactivity AT binyan amixedvisualencodingmodelbasedonthelargerscalereceptivefieldforhumanbrainactivity AT shuxiaoma mixedvisualencodingmodelbasedonthelargerscalereceptivefieldforhumanbrainactivity AT linyuanwang mixedvisualencodingmodelbasedonthelargerscalereceptivefieldforhumanbrainactivity AT panpanchen mixedvisualencodingmodelbasedonthelargerscalereceptivefieldforhumanbrainactivity AT ruoxiqin mixedvisualencodingmodelbasedonthelargerscalereceptivefieldforhumanbrainactivity AT libinhou mixedvisualencodingmodelbasedonthelargerscalereceptivefieldforhumanbrainactivity AT binyan mixedvisualencodingmodelbasedonthelargerscalereceptivefieldforhumanbrainactivity

A Mixed Visual Encoding Model Based on the Larger-Scale Receptive Field for Human Brain Activity

Similar Items