Three‐stream network with context convolution module for human–object interaction detection

Human–object interaction (HOI) detection is a popular computer vision task that detects interactions between humans and objects. This task can be useful in many applications that require a deeper understanding of semantic scenes. Current HOI detection networks typically consist of a feature extracto...

Full description

Bibliographic Details
Main Authors: Thomhert S. Siadari, Mikyong Han, Hyunjin Yoon
Format: Article
Language:English
Published: Electronics and Telecommunications Research Institute (ETRI) 2020-02-01
Series:ETRI Journal
Subjects:
Online Access:https://doi.org/10.4218/etrij.2019-0230
_version_ 1819026309293015040
author Thomhert S. Siadari
Mikyong Han
Hyunjin Yoon
author_facet Thomhert S. Siadari
Mikyong Han
Hyunjin Yoon
author_sort Thomhert S. Siadari
collection DOAJ
description Human–object interaction (HOI) detection is a popular computer vision task that detects interactions between humans and objects. This task can be useful in many applications that require a deeper understanding of semantic scenes. Current HOI detection networks typically consist of a feature extractor followed by detection layers comprising small filters (eg, 1 × 1 or 3 × 3). Although small filters can capture local spatial features with a few parameters, they fail to capture larger context information relevant for recognizing interactions between humans and distant objects owing to their small receptive regions. Hence, we herein propose a three‐stream HOI detection network that employs a context convolution module (CCM) in each stream branch. The CCM can capture larger contexts from input feature maps by adopting combinations of large separable convolution layers and residual‐based convolution layers without increasing the number of parameters by using fewer large separable filters. We evaluate our HOI detection method using two benchmark datasets, V‐COCO and HICO‐DET, and demonstrate its state‐of‐the‐art performance.
first_indexed 2024-12-21T05:24:32Z
format Article
id doaj.art-d8d10c649b114fc7a3a7e9ac86029820
institution Directory Open Access Journal
issn 1225-6463
language English
last_indexed 2024-12-21T05:24:32Z
publishDate 2020-02-01
publisher Electronics and Telecommunications Research Institute (ETRI)
record_format Article
series ETRI Journal
spelling doaj.art-d8d10c649b114fc7a3a7e9ac860298202022-12-21T19:14:44ZengElectronics and Telecommunications Research Institute (ETRI)ETRI Journal1225-64632020-02-0142223023810.4218/etrij.2019-023010.4218/etrij.2019-0230Three‐stream network with context convolution module for human–object interaction detectionThomhert S. SiadariMikyong HanHyunjin YoonHuman–object interaction (HOI) detection is a popular computer vision task that detects interactions between humans and objects. This task can be useful in many applications that require a deeper understanding of semantic scenes. Current HOI detection networks typically consist of a feature extractor followed by detection layers comprising small filters (eg, 1 × 1 or 3 × 3). Although small filters can capture local spatial features with a few parameters, they fail to capture larger context information relevant for recognizing interactions between humans and distant objects owing to their small receptive regions. Hence, we herein propose a three‐stream HOI detection network that employs a context convolution module (CCM) in each stream branch. The CCM can capture larger contexts from input feature maps by adopting combinations of large separable convolution layers and residual‐based convolution layers without increasing the number of parameters by using fewer large separable filters. We evaluate our HOI detection method using two benchmark datasets, V‐COCO and HICO‐DET, and demonstrate its state‐of‐the‐art performance.https://doi.org/10.4218/etrij.2019-0230context convolution moduledeep learninghoi detectionhuman–object interactionsthree‐stream network
spellingShingle Thomhert S. Siadari
Mikyong Han
Hyunjin Yoon
Three‐stream network with context convolution module for human–object interaction detection
ETRI Journal
context convolution module
deep learning
hoi detection
human–object interactions
three‐stream network
title Three‐stream network with context convolution module for human–object interaction detection
title_full Three‐stream network with context convolution module for human–object interaction detection
title_fullStr Three‐stream network with context convolution module for human–object interaction detection
title_full_unstemmed Three‐stream network with context convolution module for human–object interaction detection
title_short Three‐stream network with context convolution module for human–object interaction detection
title_sort three stream network with context convolution module for human object interaction detection
topic context convolution module
deep learning
hoi detection
human–object interactions
three‐stream network
url https://doi.org/10.4218/etrij.2019-0230
work_keys_str_mv AT thomhertssiadari threestreamnetworkwithcontextconvolutionmoduleforhumanobjectinteractiondetection
AT mikyonghan threestreamnetworkwithcontextconvolutionmoduleforhumanobjectinteractiondetection
AT hyunjinyoon threestreamnetworkwithcontextconvolutionmoduleforhumanobjectinteractiondetection