Three‐stream network with context convolution module for human–object interaction detection

Human–object interaction (HOI) detection is a popular computer vision task that detects interactions between humans and objects. This task can be useful in many applications that require a deeper understanding of semantic scenes. Current HOI detection networks typically consist of a feature extracto...

Full description

Bibliographic Details
Main Authors:	Thomhert S. Siadari, Mikyong Han, Hyunjin Yoon
Format:	Article
Language:	English
Published:	Electronics and Telecommunications Research Institute (ETRI) 2020-02-01
Series:	ETRI Journal
Subjects:	context convolution module deep learning hoi detection human–object interactions three‐stream network
Online Access:	https://doi.org/10.4218/etrij.2019-0230

_version_	1819026309293015040
author	Thomhert S. Siadari Mikyong Han Hyunjin Yoon
author_facet	Thomhert S. Siadari Mikyong Han Hyunjin Yoon
author_sort	Thomhert S. Siadari
collection	DOAJ
description	Human–object interaction (HOI) detection is a popular computer vision task that detects interactions between humans and objects. This task can be useful in many applications that require a deeper understanding of semantic scenes. Current HOI detection networks typically consist of a feature extractor followed by detection layers comprising small filters (eg, 1 × 1 or 3 × 3). Although small filters can capture local spatial features with a few parameters, they fail to capture larger context information relevant for recognizing interactions between humans and distant objects owing to their small receptive regions. Hence, we herein propose a three‐stream HOI detection network that employs a context convolution module (CCM) in each stream branch. The CCM can capture larger contexts from input feature maps by adopting combinations of large separable convolution layers and residual‐based convolution layers without increasing the number of parameters by using fewer large separable filters. We evaluate our HOI detection method using two benchmark datasets, V‐COCO and HICO‐DET, and demonstrate its state‐of‐the‐art performance.
first_indexed	2024-12-21T05:24:32Z
format	Article
id	doaj.art-d8d10c649b114fc7a3a7e9ac86029820
institution	Directory Open Access Journal
issn	1225-6463
language	English
last_indexed	2024-12-21T05:24:32Z
publishDate	2020-02-01
publisher	Electronics and Telecommunications Research Institute (ETRI)
record_format	Article
series	ETRI Journal
spelling	doaj.art-d8d10c649b114fc7a3a7e9ac860298202022-12-21T19:14:44ZengElectronics and Telecommunications Research Institute (ETRI)ETRI Journal1225-64632020-02-0142223023810.4218/etrij.2019-023010.4218/etrij.2019-0230Three‐stream network with context convolution module for human–object interaction detectionThomhert S. SiadariMikyong HanHyunjin YoonHuman–object interaction (HOI) detection is a popular computer vision task that detects interactions between humans and objects. This task can be useful in many applications that require a deeper understanding of semantic scenes. Current HOI detection networks typically consist of a feature extractor followed by detection layers comprising small filters (eg, 1 × 1 or 3 × 3). Although small filters can capture local spatial features with a few parameters, they fail to capture larger context information relevant for recognizing interactions between humans and distant objects owing to their small receptive regions. Hence, we herein propose a three‐stream HOI detection network that employs a context convolution module (CCM) in each stream branch. The CCM can capture larger contexts from input feature maps by adopting combinations of large separable convolution layers and residual‐based convolution layers without increasing the number of parameters by using fewer large separable filters. We evaluate our HOI detection method using two benchmark datasets, V‐COCO and HICO‐DET, and demonstrate its state‐of‐the‐art performance.https://doi.org/10.4218/etrij.2019-0230context convolution moduledeep learninghoi detectionhuman–object interactionsthree‐stream network
spellingShingle	Thomhert S. Siadari Mikyong Han Hyunjin Yoon Three‐stream network with context convolution module for human–object interaction detection ETRI Journal context convolution module deep learning hoi detection human–object interactions three‐stream network
title	Three‐stream network with context convolution module for human–object interaction detection
title_full	Three‐stream network with context convolution module for human–object interaction detection
title_fullStr	Three‐stream network with context convolution module for human–object interaction detection
title_full_unstemmed	Three‐stream network with context convolution module for human–object interaction detection
title_short	Three‐stream network with context convolution module for human–object interaction detection
title_sort	three stream network with context convolution module for human object interaction detection
topic	context convolution module deep learning hoi detection human–object interactions three‐stream network
url	https://doi.org/10.4218/etrij.2019-0230
work_keys_str_mv	AT thomhertssiadari threestreamnetworkwithcontextconvolutionmoduleforhumanobjectinteractiondetection AT mikyonghan threestreamnetworkwithcontextconvolutionmoduleforhumanobjectinteractiondetection AT hyunjinyoon threestreamnetworkwithcontextconvolutionmoduleforhumanobjectinteractiondetection

Three‐stream network with context convolution module for human–object interaction detection

Similar Items