TFIDF meets deep document representation : a re-visit of co-training for text classification

Many text classification tasks face the challenge of lack of sufficient la- belled data. Co-training algorithm is a candidate solution, which learns from both labeled and unlabelled data for better classification accuracy. However, two sufficient and redundant views of an instance are often not avai...

Full description

Bibliographic Details
Main Author:	Chen, Zhiwei
Other Authors:	Sun Aixin
Format:	Final Year Project (FYP)
Language:	English
Published:	Nanyang Technological University 2020
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Online Access:	https://hdl.handle.net/10356/138643

Description
Summary:	Many text classification tasks face the challenge of lack of sufficient la- belled data. Co-training algorithm is a candidate solution, which learns from both labeled and unlabelled data for better classification accuracy. However, two sufficient and redundant views of an instance are often not available to fully facilitate co-training in the past. With the recent develop- ment of deep learning, we now have both traditional TFIDF representation and deep representation for documents. In this paper, we conduct exper- iments to evaluate the effectiveness of co-training with different combina- tions of document representations (e.g., TFIDF, Doc2vec, ELMo, BERT) and classifiers (e.g., SVM, Random Forest, XGBoost, MLP, and CNN) on two benchmark datasets (20 Newsgroup and Ohsumed). Our results show that co-training with TFIDF and deep contextualised representation offers improvement to classification accuracy.

TFIDF meets deep document representation : a re-visit of co-training for text classification

Similar Items