Summary: | Encrypted network data classification has received considerable attention in the industry and research communities for a long time. However, the emergence of new private applications and encryption protocols has brought more new challenges. The primary task of classification is to determine whether the sample is encrypted. When the specification of private protocol is unpublished, only the whole sample can be processed, and thus the unencrypted field which coexists with the encrypted field will seriously lower the classification effect. To tackle this problem, an algorithm based on data reconstruction and moment eigenvector is proposed, which can not only estimate the encryption result but also locate the encrypted field in each data sample. In the algorithm, the encryption probability sequence is calculated firstly by data reconstruction and CNN (Convolutional Neural Network) model transfer. Then, based on the derivative of the encryption probability sequence, the suspected encrypted field set is generated. Finally, encrypted field matching is performed based on the similarity of the four-dimensional moment eigenvector, and thus the possible encrypted field of each sample is obtained. The proposed algorithm achieved a recall rate of 93% and a precision rate of 72% in an experiment of distinguishing the encrypted/unencrypted ones of complex data. The forward coverage, reverse coverage, and F1 value in identifying encrypted field reached 89%, 90%, and 90%, respectively. Compared with the encrypted field matching methods based on the K-Nearest Neighbor algorithm, Dynamic Time Warping, Runs test, and Frequency test, the method proposed in this paper exhibited salient advantages.
|