PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification

Document classification is an important area in Natural Language Processing (NLP). Because a huge amount of scientific papers have been published at an accelerating rate, it is beneficial to carry out intelligent paper classifications, especially fine-grained classification for researchers. However,...

Full description

Bibliographic Details
Main Authors: Tan Yue, Yong Li, Xuzhao Shi, Jiedong Qin, Zijiao Fan, Zonghai Hu
Format: Article
Language:English
Published: MDPI AG 2022-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/9/4554
_version_ 1827673184716980224
author Tan Yue
Yong Li
Xuzhao Shi
Jiedong Qin
Zijiao Fan
Zonghai Hu
author_facet Tan Yue
Yong Li
Xuzhao Shi
Jiedong Qin
Zijiao Fan
Zonghai Hu
author_sort Tan Yue
collection DOAJ
description Document classification is an important area in Natural Language Processing (NLP). Because a huge amount of scientific papers have been published at an accelerating rate, it is beneficial to carry out intelligent paper classifications, especially fine-grained classification for researchers. However, a public scientific paper dataset for fine-grained classification is still lacking, so the existing document classification methods have not been put to the test. To fill this vacancy, we designed and collected the PaperNet-Dataset that consists of multi-modal data (texts and figures). PaperNet 1.0 version contains hierarchical categories of papers in the fields of computer vision (CV) and NLP, 2 coarse-grained and 20 fine-grained (7 in CV and 13 in NLP). We ran current mainstream models on the PaperNet-Dataset, along with a multi-modal method that we propose. Interestingly, none of these methods reaches an accuracy of 80% in fine-grained classification, showing plenty of room for improvement. We hope that PaperNet-Dataset will inspire more work in this challenging area.
first_indexed 2024-03-10T04:21:21Z
format Article
id doaj.art-d6001215992d4dbdadc2c7221208fcb5
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T04:21:21Z
publishDate 2022-04-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-d6001215992d4dbdadc2c7221208fcb52023-11-23T07:50:40ZengMDPI AGApplied Sciences2076-34172022-04-01129455410.3390/app12094554PaperNet: A Dataset and Benchmark for Fine-Grained Paper ClassificationTan Yue0Yong Li1Xuzhao Shi2Jiedong Qin3Zijiao Fan4Zonghai Hu5School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaDocument classification is an important area in Natural Language Processing (NLP). Because a huge amount of scientific papers have been published at an accelerating rate, it is beneficial to carry out intelligent paper classifications, especially fine-grained classification for researchers. However, a public scientific paper dataset for fine-grained classification is still lacking, so the existing document classification methods have not been put to the test. To fill this vacancy, we designed and collected the PaperNet-Dataset that consists of multi-modal data (texts and figures). PaperNet 1.0 version contains hierarchical categories of papers in the fields of computer vision (CV) and NLP, 2 coarse-grained and 20 fine-grained (7 in CV and 13 in NLP). We ran current mainstream models on the PaperNet-Dataset, along with a multi-modal method that we propose. Interestingly, none of these methods reaches an accuracy of 80% in fine-grained classification, showing plenty of room for improvement. We hope that PaperNet-Dataset will inspire more work in this challenging area.https://www.mdpi.com/2076-3417/12/9/4554artificial intelligence applicationdatasetmulti-modal information processingmachine learningpaper classification
spellingShingle Tan Yue
Yong Li
Xuzhao Shi
Jiedong Qin
Zijiao Fan
Zonghai Hu
PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification
Applied Sciences
artificial intelligence application
dataset
multi-modal information processing
machine learning
paper classification
title PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification
title_full PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification
title_fullStr PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification
title_full_unstemmed PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification
title_short PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification
title_sort papernet a dataset and benchmark for fine grained paper classification
topic artificial intelligence application
dataset
multi-modal information processing
machine learning
paper classification
url https://www.mdpi.com/2076-3417/12/9/4554
work_keys_str_mv AT tanyue papernetadatasetandbenchmarkforfinegrainedpaperclassification
AT yongli papernetadatasetandbenchmarkforfinegrainedpaperclassification
AT xuzhaoshi papernetadatasetandbenchmarkforfinegrainedpaperclassification
AT jiedongqin papernetadatasetandbenchmarkforfinegrainedpaperclassification
AT zijiaofan papernetadatasetandbenchmarkforfinegrainedpaperclassification
AT zonghaihu papernetadatasetandbenchmarkforfinegrainedpaperclassification