Summary: | The estimation of crowd density is crucial for applications such as autonomous driving, visual surveillance, crowd control, public space planning, and warning visually distracted drivers prior to an accident. Having strong translational, reflective, and scale symmetry, models for estimating the density of a crowd yield an encouraging result. However, dynamic scenes with perspective distortions and rapidly changing spatial and temporal domains still present obstacles. The main reasons for this are the dynamic nature of a scene and the difficulty of representing and incorporating the feature space of objects of varying sizes into a prediction model. To overcome the aforementioned issues, this paper proposes a parallel multi-size receptive field units framework that leverages the majority of the CNN layer’s features, allowing for the representation and participation in the model prediction of the features of objects of all sizes. The proposed method utilizes features generated from lower to higher layers. As a result, different object scales can be handled at different framework depths, and various environmental densities can be estimated. However, the inclusion of the vast majority of layer features in the prediction model has a number of negative effects on the prediction’s outcome. Asymmetric non-local attention and the channel weighting module of a feature map are proposed to handle noise and background details and re-weight each channel to make it more sensitive to important features while ignoring irrelevant ones, respectively. While the output predictions of some layers have high bias and low variance, those of other layers have low bias and high variance. Using stack ensemble meta-learning, we combine individual predictions made with lower-layer features and higher-layer features to improve prediction while balancing the tradeoff between bias and variance. The UCF CC 50 dataset and the ShanghaiTech dataset have both been subjected to extensive testing. The results of the experiments indicate that the proposed method is effective for dense distributions and objects of various sizes.
|