Three‐stream network with context convolution module for human–object interaction detection
Human–object interaction (HOI) detection is a popular computer vision task that detects interactions between humans and objects. This task can be useful in many applications that require a deeper understanding of semantic scenes. Current HOI detection networks typically consist of a feature extracto...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Electronics and Telecommunications Research Institute (ETRI)
2020-02-01
|
Series: | ETRI Journal |
Subjects: | |
Online Access: | https://doi.org/10.4218/etrij.2019-0230 |
_version_ | 1819026309293015040 |
---|---|
author | Thomhert S. Siadari Mikyong Han Hyunjin Yoon |
author_facet | Thomhert S. Siadari Mikyong Han Hyunjin Yoon |
author_sort | Thomhert S. Siadari |
collection | DOAJ |
description | Human–object interaction (HOI) detection is a popular computer vision task that detects interactions between humans and objects. This task can be useful in many applications that require a deeper understanding of semantic scenes. Current HOI detection networks typically consist of a feature extractor followed by detection layers comprising small filters (eg, 1 × 1 or 3 × 3). Although small filters can capture local spatial features with a few parameters, they fail to capture larger context information relevant for recognizing interactions between humans and distant objects owing to their small receptive regions. Hence, we herein propose a three‐stream HOI detection network that employs a context convolution module (CCM) in each stream branch. The CCM can capture larger contexts from input feature maps by adopting combinations of large separable convolution layers and residual‐based convolution layers without increasing the number of parameters by using fewer large separable filters. We evaluate our HOI detection method using two benchmark datasets, V‐COCO and HICO‐DET, and demonstrate its state‐of‐the‐art performance. |
first_indexed | 2024-12-21T05:24:32Z |
format | Article |
id | doaj.art-d8d10c649b114fc7a3a7e9ac86029820 |
institution | Directory Open Access Journal |
issn | 1225-6463 |
language | English |
last_indexed | 2024-12-21T05:24:32Z |
publishDate | 2020-02-01 |
publisher | Electronics and Telecommunications Research Institute (ETRI) |
record_format | Article |
series | ETRI Journal |
spelling | doaj.art-d8d10c649b114fc7a3a7e9ac860298202022-12-21T19:14:44ZengElectronics and Telecommunications Research Institute (ETRI)ETRI Journal1225-64632020-02-0142223023810.4218/etrij.2019-023010.4218/etrij.2019-0230Three‐stream network with context convolution module for human–object interaction detectionThomhert S. SiadariMikyong HanHyunjin YoonHuman–object interaction (HOI) detection is a popular computer vision task that detects interactions between humans and objects. This task can be useful in many applications that require a deeper understanding of semantic scenes. Current HOI detection networks typically consist of a feature extractor followed by detection layers comprising small filters (eg, 1 × 1 or 3 × 3). Although small filters can capture local spatial features with a few parameters, they fail to capture larger context information relevant for recognizing interactions between humans and distant objects owing to their small receptive regions. Hence, we herein propose a three‐stream HOI detection network that employs a context convolution module (CCM) in each stream branch. The CCM can capture larger contexts from input feature maps by adopting combinations of large separable convolution layers and residual‐based convolution layers without increasing the number of parameters by using fewer large separable filters. We evaluate our HOI detection method using two benchmark datasets, V‐COCO and HICO‐DET, and demonstrate its state‐of‐the‐art performance.https://doi.org/10.4218/etrij.2019-0230context convolution moduledeep learninghoi detectionhuman–object interactionsthree‐stream network |
spellingShingle | Thomhert S. Siadari Mikyong Han Hyunjin Yoon Three‐stream network with context convolution module for human–object interaction detection ETRI Journal context convolution module deep learning hoi detection human–object interactions three‐stream network |
title | Three‐stream network with context convolution module for human–object interaction detection |
title_full | Three‐stream network with context convolution module for human–object interaction detection |
title_fullStr | Three‐stream network with context convolution module for human–object interaction detection |
title_full_unstemmed | Three‐stream network with context convolution module for human–object interaction detection |
title_short | Three‐stream network with context convolution module for human–object interaction detection |
title_sort | three stream network with context convolution module for human object interaction detection |
topic | context convolution module deep learning hoi detection human–object interactions three‐stream network |
url | https://doi.org/10.4218/etrij.2019-0230 |
work_keys_str_mv | AT thomhertssiadari threestreamnetworkwithcontextconvolutionmoduleforhumanobjectinteractiondetection AT mikyonghan threestreamnetworkwithcontextconvolutionmoduleforhumanobjectinteractiondetection AT hyunjinyoon threestreamnetworkwithcontextconvolutionmoduleforhumanobjectinteractiondetection |