PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification

Document classification is an important area in Natural Language Processing (NLP). Because a huge amount of scientific papers have been published at an accelerating rate, it is beneficial to carry out intelligent paper classifications, especially fine-grained classification for researchers. However,...

Full description

Bibliographic Details
Main Authors:	Tan Yue, Yong Li, Xuzhao Shi, Jiedong Qin, Zijiao Fan, Zonghai Hu
Format:	Article
Language:	English
Published:	MDPI AG 2022-04-01
Series:	Applied Sciences
Subjects:	artificial intelligence application dataset multi-modal information processing machine learning paper classification
Online Access:	https://www.mdpi.com/2076-3417/12/9/4554

_version_	1827673184716980224
author	Tan Yue Yong Li Xuzhao Shi Jiedong Qin Zijiao Fan Zonghai Hu
author_facet	Tan Yue Yong Li Xuzhao Shi Jiedong Qin Zijiao Fan Zonghai Hu
author_sort	Tan Yue
collection	DOAJ
description	Document classification is an important area in Natural Language Processing (NLP). Because a huge amount of scientific papers have been published at an accelerating rate, it is beneficial to carry out intelligent paper classifications, especially fine-grained classification for researchers. However, a public scientific paper dataset for fine-grained classification is still lacking, so the existing document classification methods have not been put to the test. To fill this vacancy, we designed and collected the PaperNet-Dataset that consists of multi-modal data (texts and figures). PaperNet 1.0 version contains hierarchical categories of papers in the fields of computer vision (CV) and NLP, 2 coarse-grained and 20 fine-grained (7 in CV and 13 in NLP). We ran current mainstream models on the PaperNet-Dataset, along with a multi-modal method that we propose. Interestingly, none of these methods reaches an accuracy of 80% in fine-grained classification, showing plenty of room for improvement. We hope that PaperNet-Dataset will inspire more work in this challenging area.
first_indexed	2024-03-10T04:21:21Z
format	Article
id	doaj.art-d6001215992d4dbdadc2c7221208fcb5
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-10T04:21:21Z
publishDate	2022-04-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-d6001215992d4dbdadc2c7221208fcb52023-11-23T07:50:40ZengMDPI AGApplied Sciences2076-34172022-04-01129455410.3390/app12094554PaperNet: A Dataset and Benchmark for Fine-Grained Paper ClassificationTan Yue0Yong Li1Xuzhao Shi2Jiedong Qin3Zijiao Fan4Zonghai Hu5School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaDocument classification is an important area in Natural Language Processing (NLP). Because a huge amount of scientific papers have been published at an accelerating rate, it is beneficial to carry out intelligent paper classifications, especially fine-grained classification for researchers. However, a public scientific paper dataset for fine-grained classification is still lacking, so the existing document classification methods have not been put to the test. To fill this vacancy, we designed and collected the PaperNet-Dataset that consists of multi-modal data (texts and figures). PaperNet 1.0 version contains hierarchical categories of papers in the fields of computer vision (CV) and NLP, 2 coarse-grained and 20 fine-grained (7 in CV and 13 in NLP). We ran current mainstream models on the PaperNet-Dataset, along with a multi-modal method that we propose. Interestingly, none of these methods reaches an accuracy of 80% in fine-grained classification, showing plenty of room for improvement. We hope that PaperNet-Dataset will inspire more work in this challenging area.https://www.mdpi.com/2076-3417/12/9/4554artificial intelligence applicationdatasetmulti-modal information processingmachine learningpaper classification
spellingShingle	Tan Yue Yong Li Xuzhao Shi Jiedong Qin Zijiao Fan Zonghai Hu PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification Applied Sciences artificial intelligence application dataset multi-modal information processing machine learning paper classification
title	PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification
title_full	PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification
title_fullStr	PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification
title_full_unstemmed	PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification
title_short	PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification
title_sort	papernet a dataset and benchmark for fine grained paper classification
topic	artificial intelligence application dataset multi-modal information processing machine learning paper classification
url	https://www.mdpi.com/2076-3417/12/9/4554
work_keys_str_mv	AT tanyue papernetadatasetandbenchmarkforfinegrainedpaperclassification AT yongli papernetadatasetandbenchmarkforfinegrainedpaperclassification AT xuzhaoshi papernetadatasetandbenchmarkforfinegrainedpaperclassification AT jiedongqin papernetadatasetandbenchmarkforfinegrainedpaperclassification AT zijiaofan papernetadatasetandbenchmarkforfinegrainedpaperclassification AT zonghaihu papernetadatasetandbenchmarkforfinegrainedpaperclassification

PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification

Similar Items