Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case Study
On 30 January 2020, the World Health Organization announced a new coronavirus, which later turned out to be very dangerous. Since that date, COVID-19 has spread to become a pandemic that has now affected practically all regions in the world. Since then, many researchers in medicine have contributed...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-03-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/12/6/2891 |
_version_ | 1797473030768689152 |
---|---|
author | Mayara Khadhraoui Hatem Bellaaj Mehdi Ben Ammar Habib Hamam Mohamed Jmaiel |
author_facet | Mayara Khadhraoui Hatem Bellaaj Mehdi Ben Ammar Habib Hamam Mohamed Jmaiel |
author_sort | Mayara Khadhraoui |
collection | DOAJ |
description | On 30 January 2020, the World Health Organization announced a new coronavirus, which later turned out to be very dangerous. Since that date, COVID-19 has spread to become a pandemic that has now affected practically all regions in the world. Since then, many researchers in medicine have contributed to fighting COVID-19. In this context and given the great growth of scientific publications related to this global pandemic, manual text and data retrieval has become a challenging task. To remedy this challenge, we are proposing CovBERT, a pre-trained language model based on the BERT model to automate the literature review process. CovBERT relies on prior training on a large corpus of scientific publications in the biomedical domain and related to COVID-19 to increase its performance on the literature review task. We evaluate CovBERT on the classification of short text based on our scientific dataset of biomedical articles on COVID-19 entitled COV-Dat-20. We demonstrate statistically significant improvements by using BERT. |
first_indexed | 2024-03-09T20:09:25Z |
format | Article |
id | doaj.art-1745836d9d094e6aaaf2688a3ff07f8c |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-09T20:09:25Z |
publishDate | 2022-03-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-1745836d9d094e6aaaf2688a3ff07f8c2023-11-24T00:20:53ZengMDPI AGApplied Sciences2076-34172022-03-01126289110.3390/app12062891Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case StudyMayara Khadhraoui0Hatem Bellaaj1Mehdi Ben Ammar2Habib Hamam3Mohamed Jmaiel4National Engineering School of Sfax (ENIS), University of Sfax, Sfax 3038, TunisiaReDCAD Laboratory, Department of Computer Engineering and Applied Mathematics, University of Sfax, Sfax 3029, TunisiaSolutions Galore Inc., Moncton, NB E1C 5Y1, CanadaFaculty of Engineering, Université de Moncton, Moncton, NB E1A 3E9, CanadaReDCAD Laboratory, Department of Computer Engineering and Applied Mathematics, University of Sfax, Sfax 3029, TunisiaOn 30 January 2020, the World Health Organization announced a new coronavirus, which later turned out to be very dangerous. Since that date, COVID-19 has spread to become a pandemic that has now affected practically all regions in the world. Since then, many researchers in medicine have contributed to fighting COVID-19. In this context and given the great growth of scientific publications related to this global pandemic, manual text and data retrieval has become a challenging task. To remedy this challenge, we are proposing CovBERT, a pre-trained language model based on the BERT model to automate the literature review process. CovBERT relies on prior training on a large corpus of scientific publications in the biomedical domain and related to COVID-19 to increase its performance on the literature review task. We evaluate CovBERT on the classification of short text based on our scientific dataset of biomedical articles on COVID-19 entitled COV-Dat-20. We demonstrate statistically significant improvements by using BERT.https://www.mdpi.com/2076-3417/12/6/2891BERTCOVID-19scientific text classificationtransfer learningscientific publicationsdeep learning |
spellingShingle | Mayara Khadhraoui Hatem Bellaaj Mehdi Ben Ammar Habib Hamam Mohamed Jmaiel Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case Study Applied Sciences BERT COVID-19 scientific text classification transfer learning scientific publications deep learning |
title | Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case Study |
title_full | Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case Study |
title_fullStr | Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case Study |
title_full_unstemmed | Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case Study |
title_short | Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case Study |
title_sort | survey of bert base models for scientific text classification covid 19 case study |
topic | BERT COVID-19 scientific text classification transfer learning scientific publications deep learning |
url | https://www.mdpi.com/2076-3417/12/6/2891 |
work_keys_str_mv | AT mayarakhadhraoui surveyofbertbasemodelsforscientifictextclassificationcovid19casestudy AT hatembellaaj surveyofbertbasemodelsforscientifictextclassificationcovid19casestudy AT mehdibenammar surveyofbertbasemodelsforscientifictextclassificationcovid19casestudy AT habibhamam surveyofbertbasemodelsforscientifictextclassificationcovid19casestudy AT mohamedjmaiel surveyofbertbasemodelsforscientifictextclassificationcovid19casestudy |