PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification
Document classification is an important area in Natural Language Processing (NLP). Because a huge amount of scientific papers have been published at an accelerating rate, it is beneficial to carry out intelligent paper classifications, especially fine-grained classification for researchers. However,...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-04-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/12/9/4554 |
_version_ | 1827673184716980224 |
---|---|
author | Tan Yue Yong Li Xuzhao Shi Jiedong Qin Zijiao Fan Zonghai Hu |
author_facet | Tan Yue Yong Li Xuzhao Shi Jiedong Qin Zijiao Fan Zonghai Hu |
author_sort | Tan Yue |
collection | DOAJ |
description | Document classification is an important area in Natural Language Processing (NLP). Because a huge amount of scientific papers have been published at an accelerating rate, it is beneficial to carry out intelligent paper classifications, especially fine-grained classification for researchers. However, a public scientific paper dataset for fine-grained classification is still lacking, so the existing document classification methods have not been put to the test. To fill this vacancy, we designed and collected the PaperNet-Dataset that consists of multi-modal data (texts and figures). PaperNet 1.0 version contains hierarchical categories of papers in the fields of computer vision (CV) and NLP, 2 coarse-grained and 20 fine-grained (7 in CV and 13 in NLP). We ran current mainstream models on the PaperNet-Dataset, along with a multi-modal method that we propose. Interestingly, none of these methods reaches an accuracy of 80% in fine-grained classification, showing plenty of room for improvement. We hope that PaperNet-Dataset will inspire more work in this challenging area. |
first_indexed | 2024-03-10T04:21:21Z |
format | Article |
id | doaj.art-d6001215992d4dbdadc2c7221208fcb5 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-10T04:21:21Z |
publishDate | 2022-04-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-d6001215992d4dbdadc2c7221208fcb52023-11-23T07:50:40ZengMDPI AGApplied Sciences2076-34172022-04-01129455410.3390/app12094554PaperNet: A Dataset and Benchmark for Fine-Grained Paper ClassificationTan Yue0Yong Li1Xuzhao Shi2Jiedong Qin3Zijiao Fan4Zonghai Hu5School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaDocument classification is an important area in Natural Language Processing (NLP). Because a huge amount of scientific papers have been published at an accelerating rate, it is beneficial to carry out intelligent paper classifications, especially fine-grained classification for researchers. However, a public scientific paper dataset for fine-grained classification is still lacking, so the existing document classification methods have not been put to the test. To fill this vacancy, we designed and collected the PaperNet-Dataset that consists of multi-modal data (texts and figures). PaperNet 1.0 version contains hierarchical categories of papers in the fields of computer vision (CV) and NLP, 2 coarse-grained and 20 fine-grained (7 in CV and 13 in NLP). We ran current mainstream models on the PaperNet-Dataset, along with a multi-modal method that we propose. Interestingly, none of these methods reaches an accuracy of 80% in fine-grained classification, showing plenty of room for improvement. We hope that PaperNet-Dataset will inspire more work in this challenging area.https://www.mdpi.com/2076-3417/12/9/4554artificial intelligence applicationdatasetmulti-modal information processingmachine learningpaper classification |
spellingShingle | Tan Yue Yong Li Xuzhao Shi Jiedong Qin Zijiao Fan Zonghai Hu PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification Applied Sciences artificial intelligence application dataset multi-modal information processing machine learning paper classification |
title | PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification |
title_full | PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification |
title_fullStr | PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification |
title_full_unstemmed | PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification |
title_short | PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification |
title_sort | papernet a dataset and benchmark for fine grained paper classification |
topic | artificial intelligence application dataset multi-modal information processing machine learning paper classification |
url | https://www.mdpi.com/2076-3417/12/9/4554 |
work_keys_str_mv | AT tanyue papernetadatasetandbenchmarkforfinegrainedpaperclassification AT yongli papernetadatasetandbenchmarkforfinegrainedpaperclassification AT xuzhaoshi papernetadatasetandbenchmarkforfinegrainedpaperclassification AT jiedongqin papernetadatasetandbenchmarkforfinegrainedpaperclassification AT zijiaofan papernetadatasetandbenchmarkforfinegrainedpaperclassification AT zonghaihu papernetadatasetandbenchmarkforfinegrainedpaperclassification |