Summary: | Naturalistic driving data (NDD) are valuable for testing autonomous driving systems under various driving conditions. Automatically identifying scenes from high-dimensional and unlabeled NDD remains a challenging task. This paper presents a novel approach for automatically identifying test scenarios for autonomous driving through deep unsupervised learning. Firstly, US DAS2 NDD are leveraged, and the selection of data variables representing the vehicle state and surrounding environment is conducted to formulate the segmentation criterion. The isolation forest (IF) algorithm is then employed to segment the data, yielding two distinct types of datasets: typical scenarios and extreme scenarios. Secondly, a one-dimensional residual convolutional autoencoder (1D-RCAE) is developed to extract scenario features from the two datasets. Compared to four other autoencoders, the 1D-RCAE can effectively extract crucial information from high-dimensional data with optimal feature extraction capability. Next, considering the varying importance of different features, an information entropy (IE)-optimized K-means algorithm is employed to cluster the features extracted using 1D-RCAE. Finally, statistical analysis is performed on the parameters of each cluster of scenarios to explore their distribution characteristics within each class, and four typical scenarios are identified along with five extreme scenarios. The proposed unsupervised framework, combining IF, 1D-RCAE, and IE-improved K-means algorithms, can automatically identify typical and extreme scenarios from NDD. These identified scenarios can then be applied to test the performance of autonomous driving systems, enriching the library of automated driving test scenarios.
|