Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case Study

On 30 January 2020, the World Health Organization announced a new coronavirus, which later turned out to be very dangerous. Since that date, COVID-19 has spread to become a pandemic that has now affected practically all regions in the world. Since then, many researchers in medicine have contributed...

Full description

Bibliographic Details
Main Authors: Mayara Khadhraoui, Hatem Bellaaj, Mehdi Ben Ammar, Habib Hamam, Mohamed Jmaiel
Format: Article
Language:English
Published: MDPI AG 2022-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/6/2891
_version_ 1797473030768689152
author Mayara Khadhraoui
Hatem Bellaaj
Mehdi Ben Ammar
Habib Hamam
Mohamed Jmaiel
author_facet Mayara Khadhraoui
Hatem Bellaaj
Mehdi Ben Ammar
Habib Hamam
Mohamed Jmaiel
author_sort Mayara Khadhraoui
collection DOAJ
description On 30 January 2020, the World Health Organization announced a new coronavirus, which later turned out to be very dangerous. Since that date, COVID-19 has spread to become a pandemic that has now affected practically all regions in the world. Since then, many researchers in medicine have contributed to fighting COVID-19. In this context and given the great growth of scientific publications related to this global pandemic, manual text and data retrieval has become a challenging task. To remedy this challenge, we are proposing CovBERT, a pre-trained language model based on the BERT model to automate the literature review process. CovBERT relies on prior training on a large corpus of scientific publications in the biomedical domain and related to COVID-19 to increase its performance on the literature review task. We evaluate CovBERT on the classification of short text based on our scientific dataset of biomedical articles on COVID-19 entitled COV-Dat-20. We demonstrate statistically significant improvements by using BERT.
first_indexed 2024-03-09T20:09:25Z
format Article
id doaj.art-1745836d9d094e6aaaf2688a3ff07f8c
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-09T20:09:25Z
publishDate 2022-03-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-1745836d9d094e6aaaf2688a3ff07f8c2023-11-24T00:20:53ZengMDPI AGApplied Sciences2076-34172022-03-01126289110.3390/app12062891Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case StudyMayara Khadhraoui0Hatem Bellaaj1Mehdi Ben Ammar2Habib Hamam3Mohamed Jmaiel4National Engineering School of Sfax (ENIS), University of Sfax, Sfax 3038, TunisiaReDCAD Laboratory, Department of Computer Engineering and Applied Mathematics, University of Sfax, Sfax 3029, TunisiaSolutions Galore Inc., Moncton, NB E1C 5Y1, CanadaFaculty of Engineering, Université de Moncton, Moncton, NB E1A 3E9, CanadaReDCAD Laboratory, Department of Computer Engineering and Applied Mathematics, University of Sfax, Sfax 3029, TunisiaOn 30 January 2020, the World Health Organization announced a new coronavirus, which later turned out to be very dangerous. Since that date, COVID-19 has spread to become a pandemic that has now affected practically all regions in the world. Since then, many researchers in medicine have contributed to fighting COVID-19. In this context and given the great growth of scientific publications related to this global pandemic, manual text and data retrieval has become a challenging task. To remedy this challenge, we are proposing CovBERT, a pre-trained language model based on the BERT model to automate the literature review process. CovBERT relies on prior training on a large corpus of scientific publications in the biomedical domain and related to COVID-19 to increase its performance on the literature review task. We evaluate CovBERT on the classification of short text based on our scientific dataset of biomedical articles on COVID-19 entitled COV-Dat-20. We demonstrate statistically significant improvements by using BERT.https://www.mdpi.com/2076-3417/12/6/2891BERTCOVID-19scientific text classificationtransfer learningscientific publicationsdeep learning
spellingShingle Mayara Khadhraoui
Hatem Bellaaj
Mehdi Ben Ammar
Habib Hamam
Mohamed Jmaiel
Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case Study
Applied Sciences
BERT
COVID-19
scientific text classification
transfer learning
scientific publications
deep learning
title Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case Study
title_full Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case Study
title_fullStr Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case Study
title_full_unstemmed Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case Study
title_short Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case Study
title_sort survey of bert base models for scientific text classification covid 19 case study
topic BERT
COVID-19
scientific text classification
transfer learning
scientific publications
deep learning
url https://www.mdpi.com/2076-3417/12/6/2891
work_keys_str_mv AT mayarakhadhraoui surveyofbertbasemodelsforscientifictextclassificationcovid19casestudy
AT hatembellaaj surveyofbertbasemodelsforscientifictextclassificationcovid19casestudy
AT mehdibenammar surveyofbertbasemodelsforscientifictextclassificationcovid19casestudy
AT habibhamam surveyofbertbasemodelsforscientifictextclassificationcovid19casestudy
AT mohamedjmaiel surveyofbertbasemodelsforscientifictextclassificationcovid19casestudy