Summary: | In the age of digital information, countless news articles are published on the internet every day. News articles in digital formats published by various publishers consists of multiple topics, from sports, technology, politics, natural disasters, etc. Individuals have different preferences in following a news topic development. Text classification could be implemented to perform news topic categorization, so that news is categorized based on its topic and people could easily select the news based on the topic that they want to follow. However, there are no platform that provides news articles from multiple trusted publishers, and filters the list based on the news topic. This project aims to perform news topic categorization, and develop an application that able to provides user with news articles based on its topic.
This project explores text classification algorithm to perform news topic categorization. From classical approach using Machine Learning, a modern approach using Deep Learning, and using Transformer, the algorithms are tested out to create the best text classification model that can perform text classification. End to end text classification technique were performed, from conducting exploratory data analysis, cleaning and normalizing text data, and training and testing the classification model. The project also performing a machine learning pipeline creation that will be used for the news application. The final phase of the project is conducting a full stack application development to create a news application that able to provides news articles from various publishers based on the news topic.
|