Summary: | There has been increased attention paid to autonomous unmanned aerial vehicles (UAVs) recently because of their usage in several fields. Human action recognition (HAR) in UAV videos plays an important role in various real-life applications. Although HAR using UAV frames has not received much attention from researchers to date, it is still a significant area that needs further study because of its relevance for the development of efficient algorithms for autonomous drone surveillance. Current deep-learning models for HAR have limitations, such as large weight parameters and slow inference speeds, which make them unsuitable for practical applications that require fast and accurate detection of unusual human actions. In response to this problem, this paper presents a new deep-learning model based on depthwise separable convolutions that has been designed to be lightweight. Other parts of the HarNet model comprised convolutional, rectified linear unit, dropout, pooling, padding, and dense blocks. The effectiveness of the model has been tested using the publicly available UCF-ARG dataset. The proposed model, called HarNet, has enhanced the rate of successful classification. Each unit of frame data was pre-processed one by one by different computer vision methods before it was incorporated into the HarNet model. The proposed model, which has a compact architecture with just 2.2 million parameters, obtained a 96.15% success rate in classification, outperforming the MobileNet, Xception, DenseNet201, Inception-ResNetV2, VGG-16, and VGG-19 models on the same dataset. The proposed model had numerous key advantages, including low complexity, a small number of parameters, and high classification performance. The outcomes of this paper showed that the model’s performance was superior to that of other models that used the UCF-ARG dataset.
|