Summary: | Internet explosion and penetration have amplified the fake news problem that existed even before Internet penetration. This becomes more of a concern, if the news is health-related. To address this issue, this research proposes Content Based Models (CBM) and Feature Based Models (FBM). The difference between the two models lies in the input provided. The CBM only takes news content as the input, whereas the FBM along with the content also takes two readability features as the input. Under each category, the performance of five traditional machine learning techniques: - Decision Tree, Random Forest, Support Vector Machine, AdaBoost-Decision Tree and AdaBoost-Random Forest is compared with two hybrid Deep Learning approaches, namely CNN-LSTM and CNN-BiLSTM. The Fake News Healthcare dataset comprising 9581 articles was utilized for the study. Easy Data Augmentation technique is used to balance this highly imbalanced dataset. The experimental results demonstrate that Feature Based Models perform better than Content Based Models. Among the proposed FBM, the Hybrid CNN - LSTM model had a F1 score of 97.09% and AdaBoost-Random Forest had a F1 Score of 98.9%. Thus, Adaboost-Random Forest under FBM is the best-performing model for the classification of fake news.
|