Domain adaptive crowd counting via dynamic scale aggregation network
Abstract Crowd counting is an important research topic in computer vision. Its goal is to estimate the people's number in an image. Researchers have dramatically improved counting accuracy in recent years by regressing density maps. However, because of the inherent domain shift, the model train...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2023-10-01
|
Series: | IET Computer Vision |
Subjects: | |
Online Access: | https://doi.org/10.1049/cvi2.12198 |
_version_ | 1827796496260530176 |
---|---|
author | Zhanqiang Huo Yanan Wang Yingxu Qiao Jing Wang Fen Luo |
author_facet | Zhanqiang Huo Yanan Wang Yingxu Qiao Jing Wang Fen Luo |
author_sort | Zhanqiang Huo |
collection | DOAJ |
description | Abstract Crowd counting is an important research topic in computer vision. Its goal is to estimate the people's number in an image. Researchers have dramatically improved counting accuracy in recent years by regressing density maps. However, because of the inherent domain shift, the model trained on an expensive manually labelled dataset (source domain) does not perform well on a dataset with scarce labels (target domain). For this issue, a novel dynamic scale aggregation network (DSANet) is proposed to reduce the gaps in style and cross‐domain head scale variations. Specifically, a practical style transfer layer is introduced to reduce the appearance discrepancy between the source and target domains. Then, the translated source and target domain samples are encoded by a generator consisting of the VGG16 network and the dynamic scale aggregation modules (DSA Modules) and produce corresponding density maps. The DSA module can adaptively adjust parameters according to the input features and effectively fuse multi‐scale information to overcome the cross‐domain head scale variations. Next, a discriminator judges the input density map from the source or target domain. Last, domain distributions are aligned through adversarial between the generator and the discriminator. The experiments show that our network outperforms the current state‐of‐the‐art methods and can improve the target domain's performance while maintaining the source domain's performance without significant degradation. |
first_indexed | 2024-03-11T19:07:57Z |
format | Article |
id | doaj.art-20fefc2314df4618afc4ebcbe53db88c |
institution | Directory Open Access Journal |
issn | 1751-9632 1751-9640 |
language | English |
last_indexed | 2024-03-11T19:07:57Z |
publishDate | 2023-10-01 |
publisher | Wiley |
record_format | Article |
series | IET Computer Vision |
spelling | doaj.art-20fefc2314df4618afc4ebcbe53db88c2023-10-10T04:15:41ZengWileyIET Computer Vision1751-96321751-96402023-10-0117781482810.1049/cvi2.12198Domain adaptive crowd counting via dynamic scale aggregation networkZhanqiang Huo0Yanan Wang1Yingxu Qiao2Jing Wang3Fen Luo4School of Software Henan Polytechnic University Jiaozuo ChinaSchool of Software Henan Polytechnic University Jiaozuo ChinaCollege of Computer Science and Technology Henan Polytechnic University Jiaozuo ChinaSchool of Software Henan Polytechnic University Jiaozuo ChinaSchool of Software Henan Polytechnic University Jiaozuo ChinaAbstract Crowd counting is an important research topic in computer vision. Its goal is to estimate the people's number in an image. Researchers have dramatically improved counting accuracy in recent years by regressing density maps. However, because of the inherent domain shift, the model trained on an expensive manually labelled dataset (source domain) does not perform well on a dataset with scarce labels (target domain). For this issue, a novel dynamic scale aggregation network (DSANet) is proposed to reduce the gaps in style and cross‐domain head scale variations. Specifically, a practical style transfer layer is introduced to reduce the appearance discrepancy between the source and target domains. Then, the translated source and target domain samples are encoded by a generator consisting of the VGG16 network and the dynamic scale aggregation modules (DSA Modules) and produce corresponding density maps. The DSA module can adaptively adjust parameters according to the input features and effectively fuse multi‐scale information to overcome the cross‐domain head scale variations. Next, a discriminator judges the input density map from the source or target domain. Last, domain distributions are aligned through adversarial between the generator and the discriminator. The experiments show that our network outperforms the current state‐of‐the‐art methods and can improve the target domain's performance while maintaining the source domain's performance without significant degradation.https://doi.org/10.1049/cvi2.12198computer visionimage processing |
spellingShingle | Zhanqiang Huo Yanan Wang Yingxu Qiao Jing Wang Fen Luo Domain adaptive crowd counting via dynamic scale aggregation network IET Computer Vision computer vision image processing |
title | Domain adaptive crowd counting via dynamic scale aggregation network |
title_full | Domain adaptive crowd counting via dynamic scale aggregation network |
title_fullStr | Domain adaptive crowd counting via dynamic scale aggregation network |
title_full_unstemmed | Domain adaptive crowd counting via dynamic scale aggregation network |
title_short | Domain adaptive crowd counting via dynamic scale aggregation network |
title_sort | domain adaptive crowd counting via dynamic scale aggregation network |
topic | computer vision image processing |
url | https://doi.org/10.1049/cvi2.12198 |
work_keys_str_mv | AT zhanqianghuo domainadaptivecrowdcountingviadynamicscaleaggregationnetwork AT yananwang domainadaptivecrowdcountingviadynamicscaleaggregationnetwork AT yingxuqiao domainadaptivecrowdcountingviadynamicscaleaggregationnetwork AT jingwang domainadaptivecrowdcountingviadynamicscaleaggregationnetwork AT fenluo domainadaptivecrowdcountingviadynamicscaleaggregationnetwork |