Hierarchical multi-label news article classification with distributed semantic model based features
Automatic news categorization is essential to automatically handle the classification of multi-label news articles in online portal. This research employs some potential methods to improve performance of hierarchical multi-label classifier for Indonesian news article. First potential method is using...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Universitas Ahmad Dahlan
2019-03-01
|
Series: | IJAIN (International Journal of Advances in Intelligent Informatics) |
Subjects: | |
Online Access: | http://ijain.org/index.php/IJAIN/article/view/168 |
_version_ | 1811324051191234560 |
---|---|
author | Ivana Clairine Irsan Masayu Leylia Khodra |
author_facet | Ivana Clairine Irsan Masayu Leylia Khodra |
author_sort | Ivana Clairine Irsan |
collection | DOAJ |
description | Automatic news categorization is essential to automatically handle the classification of multi-label news articles in online portal. This research employs some potential methods to improve performance of hierarchical multi-label classifier for Indonesian news article. First potential method is using Convolutional Neural Network (CNN) to build the top level classifier. The second method could improve the classification performance by calculating the average of the word vectors obtained from distributed semantic model. The third method combines lexical and semantic method to extract documents features, which multiplied word term frequency (lexical) with word vector average (semantic). Model build using Calibrated Label Ranking as multi-label classification method, and trained using Naïve Bayes algorithm has the best F1-measure of 0.7531. Multiplication of word term frequency and the average of word vectors were also used to build this classifiers. This configuration improved multi-label classification performance by 4.25%, compared to the baseline. The distributed semantic model that gave best performance in this experiment obtained from 300-dimension word2vec of Wikipedia’s articles. The multi-label classification model performance is also influenced by news’ released date. The difference period between training and testing data would also decrease models’ performance. |
first_indexed | 2024-04-13T14:06:15Z |
format | Article |
id | doaj.art-3040b887a8fa418b92bfa28bbc9eb48b |
institution | Directory Open Access Journal |
issn | 2442-6571 2548-3161 |
language | English |
last_indexed | 2024-04-13T14:06:15Z |
publishDate | 2019-03-01 |
publisher | Universitas Ahmad Dahlan |
record_format | Article |
series | IJAIN (International Journal of Advances in Intelligent Informatics) |
spelling | doaj.art-3040b887a8fa418b92bfa28bbc9eb48b2022-12-22T02:43:53ZengUniversitas Ahmad DahlanIJAIN (International Journal of Advances in Intelligent Informatics)2442-65712548-31612019-03-0151404710.26555/ijain.v5i1.168108Hierarchical multi-label news article classification with distributed semantic model based featuresIvana Clairine Irsan0Masayu Leylia Khodra1Institut Teknologi BandungInstitut Teknologi BandungAutomatic news categorization is essential to automatically handle the classification of multi-label news articles in online portal. This research employs some potential methods to improve performance of hierarchical multi-label classifier for Indonesian news article. First potential method is using Convolutional Neural Network (CNN) to build the top level classifier. The second method could improve the classification performance by calculating the average of the word vectors obtained from distributed semantic model. The third method combines lexical and semantic method to extract documents features, which multiplied word term frequency (lexical) with word vector average (semantic). Model build using Calibrated Label Ranking as multi-label classification method, and trained using Naïve Bayes algorithm has the best F1-measure of 0.7531. Multiplication of word term frequency and the average of word vectors were also used to build this classifiers. This configuration improved multi-label classification performance by 4.25%, compared to the baseline. The distributed semantic model that gave best performance in this experiment obtained from 300-dimension word2vec of Wikipedia’s articles. The multi-label classification model performance is also influenced by news’ released date. The difference period between training and testing data would also decrease models’ performance.http://ijain.org/index.php/IJAIN/article/view/168Multi-label classificationHierarchical multi-label classificationCNNWord embeddingNews |
spellingShingle | Ivana Clairine Irsan Masayu Leylia Khodra Hierarchical multi-label news article classification with distributed semantic model based features IJAIN (International Journal of Advances in Intelligent Informatics) Multi-label classification Hierarchical multi-label classification CNN Word embedding News |
title | Hierarchical multi-label news article classification with distributed semantic model based features |
title_full | Hierarchical multi-label news article classification with distributed semantic model based features |
title_fullStr | Hierarchical multi-label news article classification with distributed semantic model based features |
title_full_unstemmed | Hierarchical multi-label news article classification with distributed semantic model based features |
title_short | Hierarchical multi-label news article classification with distributed semantic model based features |
title_sort | hierarchical multi label news article classification with distributed semantic model based features |
topic | Multi-label classification Hierarchical multi-label classification CNN Word embedding News |
url | http://ijain.org/index.php/IJAIN/article/view/168 |
work_keys_str_mv | AT ivanaclairineirsan hierarchicalmultilabelnewsarticleclassificationwithdistributedsemanticmodelbasedfeatures AT masayuleyliakhodra hierarchicalmultilabelnewsarticleclassificationwithdistributedsemanticmodelbasedfeatures |