Feature selection to enhance android malware detection using modified term frequency-inverse document frequency (MTF-IDF)

This research synthesizes an evaluation of feature selection algorithm by utilizing Term Frequency-Inverse Document Frequency (TF-IDF) as the main algorithm in Android malware detection. The TF-IDF algorithm is used to filter Android features filtered before detection process. However, IDF is unawar...

Πλήρης περιγραφή

Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριος συγγραφέας: Mazlan, Nurul Hidayah
Μορφή: Thesis
Γλώσσα:English
English
English
Έκδοση: 2019
Θέματα:
Διαθέσιμο Online:http://eprints.uthm.edu.my/651/1/24p%20NURUL%20HIDAYAH%20MAZLAN.pdf
http://eprints.uthm.edu.my/651/2/NURUL%20HIDAYAH%20MAZLAN%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/651/3/NURUL%20HIDAYAH%20MAZLAN%20WATERMARK.pdf
_version_ 1825636514686566400
author Mazlan, Nurul Hidayah
author_facet Mazlan, Nurul Hidayah
author_sort Mazlan, Nurul Hidayah
collection UTHM
description This research synthesizes an evaluation of feature selection algorithm by utilizing Term Frequency-Inverse Document Frequency (TF-IDF) as the main algorithm in Android malware detection. The TF-IDF algorithm is used to filter Android features filtered before detection process. However, IDF is unaware to the training class labels and gives incorrect weight value to some features. Therefore, the proposed approach that is Modified Term Frequency – Inverse Document Frequency (MTF-IDF) algorithm give more focus on both sample and features to give correct weight value to some features. The proposed algorithm considered features based on its level of importance where weight given based on number of features involved in the sample. The related best features in the sample are selected using weight and priority ranking process using K-means. This ensures that only important malware features are selected in the Android application sample. These experiments are conducted on a sample collected from DREBIN. Comparison between existing TF-IDF algorithm and MTF-IDF algorithm have been made under various conditions such as tested on different number of sample size, different number of features used and integration of different types of features. The results showed that feature selection using MTF-IDF can improve Android malware detection analysis. It was proven that MTF-IDF is an effective Android malware detection algorithm regardless of different kinds of features or sample sizes used. MTF-IDF algorithm also proved that it can give appropriate scaling for all features in analyzing Android malware detection.
first_indexed 2024-03-05T21:37:57Z
format Thesis
id uthm.eprints-651
institution Universiti Tun Hussein Onn Malaysia
language English
English
English
last_indexed 2024-03-05T21:37:57Z
publishDate 2019
record_format dspace
spelling uthm.eprints-6512021-08-17T06:27:39Z http://eprints.uthm.edu.my/651/ Feature selection to enhance android malware detection using modified term frequency-inverse document frequency (MTF-IDF) Mazlan, Nurul Hidayah QA76 Computer software This research synthesizes an evaluation of feature selection algorithm by utilizing Term Frequency-Inverse Document Frequency (TF-IDF) as the main algorithm in Android malware detection. The TF-IDF algorithm is used to filter Android features filtered before detection process. However, IDF is unaware to the training class labels and gives incorrect weight value to some features. Therefore, the proposed approach that is Modified Term Frequency – Inverse Document Frequency (MTF-IDF) algorithm give more focus on both sample and features to give correct weight value to some features. The proposed algorithm considered features based on its level of importance where weight given based on number of features involved in the sample. The related best features in the sample are selected using weight and priority ranking process using K-means. This ensures that only important malware features are selected in the Android application sample. These experiments are conducted on a sample collected from DREBIN. Comparison between existing TF-IDF algorithm and MTF-IDF algorithm have been made under various conditions such as tested on different number of sample size, different number of features used and integration of different types of features. The results showed that feature selection using MTF-IDF can improve Android malware detection analysis. It was proven that MTF-IDF is an effective Android malware detection algorithm regardless of different kinds of features or sample sizes used. MTF-IDF algorithm also proved that it can give appropriate scaling for all features in analyzing Android malware detection. 2019-02 Thesis NonPeerReviewed text en http://eprints.uthm.edu.my/651/1/24p%20NURUL%20HIDAYAH%20MAZLAN.pdf text en http://eprints.uthm.edu.my/651/2/NURUL%20HIDAYAH%20MAZLAN%20COPYRIGHT%20DECLARATION.pdf text en http://eprints.uthm.edu.my/651/3/NURUL%20HIDAYAH%20MAZLAN%20WATERMARK.pdf Mazlan, Nurul Hidayah (2019) Feature selection to enhance android malware detection using modified term frequency-inverse document frequency (MTF-IDF). Masters thesis, Universiti Tun Hussein Onn Malaysia.
spellingShingle QA76 Computer software
Mazlan, Nurul Hidayah
Feature selection to enhance android malware detection using modified term frequency-inverse document frequency (MTF-IDF)
title Feature selection to enhance android malware detection using modified term frequency-inverse document frequency (MTF-IDF)
title_full Feature selection to enhance android malware detection using modified term frequency-inverse document frequency (MTF-IDF)
title_fullStr Feature selection to enhance android malware detection using modified term frequency-inverse document frequency (MTF-IDF)
title_full_unstemmed Feature selection to enhance android malware detection using modified term frequency-inverse document frequency (MTF-IDF)
title_short Feature selection to enhance android malware detection using modified term frequency-inverse document frequency (MTF-IDF)
title_sort feature selection to enhance android malware detection using modified term frequency inverse document frequency mtf idf
topic QA76 Computer software
url http://eprints.uthm.edu.my/651/1/24p%20NURUL%20HIDAYAH%20MAZLAN.pdf
http://eprints.uthm.edu.my/651/2/NURUL%20HIDAYAH%20MAZLAN%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/651/3/NURUL%20HIDAYAH%20MAZLAN%20WATERMARK.pdf
work_keys_str_mv AT mazlannurulhidayah featureselectiontoenhanceandroidmalwaredetectionusingmodifiedtermfrequencyinversedocumentfrequencymtfidf