“Reading Pictures Instead of Looking”: RGB-D Image-Based Action Recognition via Capsule Network and Kalman Filter

This paper proposes an action recognition algorithm based on the capsule network and Kalman filter called “Reading Pictures Instead of Looking” (RPIL). This method resolves the convolutional neural network’s over sensitivity to rotation and scaling and increases the interpretability of the model as...

Full description

Bibliographic Details
Main Authors:	Botong Zhao, Yanjie Wang, Keke Su, Hong Ren, Haichao Sun
Format:	Article
Language:	English
Published:	MDPI AG 2021-03-01
Series:	Sensors
Subjects:	human posture estimation capsule network 6D object pose estimation Kalman filter
Online Access:	https://www.mdpi.com/1424-8220/21/6/2217

_version_	1797540459359240192
author	Botong Zhao Yanjie Wang Keke Su Hong Ren Haichao Sun
author_facet	Botong Zhao Yanjie Wang Keke Su Hong Ren Haichao Sun
author_sort	Botong Zhao
collection	DOAJ
description	This paper proposes an action recognition algorithm based on the capsule network and Kalman filter called “Reading Pictures Instead of Looking” (RPIL). This method resolves the convolutional neural network’s over sensitivity to rotation and scaling and increases the interpretability of the model as per the spatial coordinates in graphics. The capsule network is first used to obtain the components of the target human body. The detected parts and their attribute parameters (e.g., spatial coordinates, color) are then analyzed by Bert. A Kalman filter analyzes the predicted capsules and filters out any misinformation to prevent the action recognition results from being affected by incorrectly predicted capsules. The parameters between neuron layers are evaluated, then the structure is pruned into a dendritic network to enhance the computational efficiency of the algorithm. This minimizes the dependence of in-depth learning on the random features extracted by the CNN without sacrificing the model’s accuracy. The association between hidden layers of the neural network is also explained. With a 90% observation rate, the OAD dataset test precision is 83.3%, the ChaLearn Gesture dataset test precision is 72.2%, and the G3D dataset test precision is 86.5%. The RPILNet also satisfies real-time operation requirements (>30 fps).
first_indexed	2024-03-10T13:00:29Z
format	Article
id	doaj.art-eb2e0bfdf6b14b7f9fd9e058e9c2b09f
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-10T13:00:29Z
publishDate	2021-03-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-eb2e0bfdf6b14b7f9fd9e058e9c2b09f2023-11-21T11:34:00ZengMDPI AGSensors1424-82202021-03-01216221710.3390/s21062217“Reading Pictures Instead of Looking”: RGB-D Image-Based Action Recognition via Capsule Network and Kalman FilterBotong Zhao0Yanjie Wang1Keke Su2Hong Ren3Haichao Sun4Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, ChinaThis paper proposes an action recognition algorithm based on the capsule network and Kalman filter called “Reading Pictures Instead of Looking” (RPIL). This method resolves the convolutional neural network’s over sensitivity to rotation and scaling and increases the interpretability of the model as per the spatial coordinates in graphics. The capsule network is first used to obtain the components of the target human body. The detected parts and their attribute parameters (e.g., spatial coordinates, color) are then analyzed by Bert. A Kalman filter analyzes the predicted capsules and filters out any misinformation to prevent the action recognition results from being affected by incorrectly predicted capsules. The parameters between neuron layers are evaluated, then the structure is pruned into a dendritic network to enhance the computational efficiency of the algorithm. This minimizes the dependence of in-depth learning on the random features extracted by the CNN without sacrificing the model’s accuracy. The association between hidden layers of the neural network is also explained. With a 90% observation rate, the OAD dataset test precision is 83.3%, the ChaLearn Gesture dataset test precision is 72.2%, and the G3D dataset test precision is 86.5%. The RPILNet also satisfies real-time operation requirements (>30 fps).https://www.mdpi.com/1424-8220/21/6/2217human posture estimationcapsule network6D object pose estimationKalman filter
spellingShingle	Botong Zhao Yanjie Wang Keke Su Hong Ren Haichao Sun “Reading Pictures Instead of Looking”: RGB-D Image-Based Action Recognition via Capsule Network and Kalman Filter Sensors human posture estimation capsule network 6D object pose estimation Kalman filter
title	“Reading Pictures Instead of Looking”: RGB-D Image-Based Action Recognition via Capsule Network and Kalman Filter
title_full	“Reading Pictures Instead of Looking”: RGB-D Image-Based Action Recognition via Capsule Network and Kalman Filter
title_fullStr	“Reading Pictures Instead of Looking”: RGB-D Image-Based Action Recognition via Capsule Network and Kalman Filter
title_full_unstemmed	“Reading Pictures Instead of Looking”: RGB-D Image-Based Action Recognition via Capsule Network and Kalman Filter
title_short	“Reading Pictures Instead of Looking”: RGB-D Image-Based Action Recognition via Capsule Network and Kalman Filter
title_sort	reading pictures instead of looking rgb d image based action recognition via capsule network and kalman filter
topic	human posture estimation capsule network 6D object pose estimation Kalman filter
url	https://www.mdpi.com/1424-8220/21/6/2217
work_keys_str_mv	AT botongzhao readingpicturesinsteadoflookingrgbdimagebasedactionrecognitionviacapsulenetworkandkalmanfilter AT yanjiewang readingpicturesinsteadoflookingrgbdimagebasedactionrecognitionviacapsulenetworkandkalmanfilter AT kekesu readingpicturesinsteadoflookingrgbdimagebasedactionrecognitionviacapsulenetworkandkalmanfilter AT hongren readingpicturesinsteadoflookingrgbdimagebasedactionrecognitionviacapsulenetworkandkalmanfilter AT haichaosun readingpicturesinsteadoflookingrgbdimagebasedactionrecognitionviacapsulenetworkandkalmanfilter

“Reading Pictures Instead of Looking”: RGB-D Image-Based Action Recognition via Capsule Network and Kalman Filter

Similar Items