Summary: | In order to solve the problem of low accuracy of malware classification,this paper proposes a research on malware classification based on N-Gram static analysis technology.Firstly,the N-Gram method is used to extract the byte sequence of length 2 from the malware samples.Secondly,according to the extracted features,KNN,logistic regression,random forest and XGBoost are used to train the malware classification model based on machine learning.Thirdly,the confusion matrix and logarithmic loss function are used to evaluate the malware classification model.Finally,the malware classification model is trained and tested in the Kaggle malware data set.Experimental results show that the accuracy rates of the malware classification models of XGBoost and random forest reach 98.43% and 97.93%,and the Log Loss values are 0.022240 and 0.026946,respectively.Compared with the existing methods,the proposed method can classify malware more accurately and protect computer system from malware attack.
|