Summary: | The early recognition and understanding of the actions performed by pedestrians in traffic scenes leads to an anticipation of pedestrian intentions in advance and helps in the process of collision warning and avoidance in the context of autonomous vehicles. An environment with low visibility conditions such as night-time, fog, heavy rain or smoke increases the number of difficult situations in traffic. A complete and original model for assessing if a pedestrian is engaged in a street cross action using only infrared monocular scene perception is proposed in this paper. The assessment of a street cross action is done by the time series analysis of features like: pedestrian motion, position of pedestrians with respect to the drivable area and their distance with respect to the ego-vehicle. The extraction of these features emerges from the combination of a deep learning based pedestrian detector with an original tracking algorithm, a semantic segmentation of the road surface and a time series long-short term memory network based action recognition. In order to validate the proposed method we introduce a new dataset named CROSSIR. It is formed of pedestrian annotations, action annotations and semantic labels for the road. The CROSSIR dataset is suitable for several common computer vision algorithms: (1) pedestrian detection and tracking algorithms because each pedestrian has a unique identifier over the frames in which it appears; (2) pedestrian action recognition; (3) semantic segmentation of the road pixels in the infrared image.
|