Fine-Tuning BERT-Based Pre-Trained Models for Arabic Dependency Parsing

With the advent of pre-trained language models, many natural language processing tasks in various languages have achieved great success. Although some research has been conducted on fine-tuning BERT-based models for syntactic parsing, and several Arabic pre-trained models have been developed, no att...

Full description

Bibliographic Details
Main Authors:	Sharefah Al-Ghamdi, Hend Al-Khalifa, Abdulmalik Al-Salman
Format:	Article
Language:	English
Published:	MDPI AG 2023-03-01
Series:	Applied Sciences
Subjects:	syntactic parsing dependency parsing fine-tuning methods machine learning neural networks deep learning
Online Access:	https://www.mdpi.com/2076-3417/13/7/4225

_version_	1797608377448136704
author	Sharefah Al-Ghamdi Hend Al-Khalifa Abdulmalik Al-Salman
author_facet	Sharefah Al-Ghamdi Hend Al-Khalifa Abdulmalik Al-Salman
author_sort	Sharefah Al-Ghamdi
collection	DOAJ
description	With the advent of pre-trained language models, many natural language processing tasks in various languages have achieved great success. Although some research has been conducted on fine-tuning BERT-based models for syntactic parsing, and several Arabic pre-trained models have been developed, no attention has been paid to Arabic dependency parsing. In this study, we attempt to fill this gap and compare nine Arabic models, fine-tuning strategies, and encoding methods for dependency parsing. We evaluated three treebanks to highlight the best options and methods for fine-tuning Arabic BERT-based models to capture syntactic dependencies in the data. Our exploratory results show that the AraBERTv2 model provides the best scores for all treebanks and confirm that fine-tuning to the higher layers of pre-trained models is required. However, adding additional neural network layers to those models drops the accuracy. Additionally, we found that the treebanks have differences in the encoding techniques that give the highest scores. The analysis of the errors obtained by the test examples highlights four issues that have an important effect on the results: parse tree post-processing, contextualized embeddings, erroneous tokenization, and erroneous annotation. This study reveals a direction for future research to achieve enhanced Arabic BERT-based syntactic parsing.
first_indexed	2024-03-11T05:42:39Z
format	Article
id	doaj.art-29ff548c37c44edfa17cf4c29ef20490
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-11T05:42:39Z
publishDate	2023-03-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-29ff548c37c44edfa17cf4c29ef204902023-11-17T16:17:17ZengMDPI AGApplied Sciences2076-34172023-03-01137422510.3390/app13074225Fine-Tuning BERT-Based Pre-Trained Models for Arabic Dependency ParsingSharefah Al-Ghamdi0Hend Al-Khalifa1Abdulmalik Al-Salman2College of Computer and Information Sciences, King Saud University, P.O. Box 2614, Riyadh 13312, Saudi ArabiaCollege of Computer and Information Sciences, King Saud University, P.O. Box 2614, Riyadh 13312, Saudi ArabiaCollege of Computer and Information Sciences, King Saud University, P.O. Box 2614, Riyadh 13312, Saudi ArabiaWith the advent of pre-trained language models, many natural language processing tasks in various languages have achieved great success. Although some research has been conducted on fine-tuning BERT-based models for syntactic parsing, and several Arabic pre-trained models have been developed, no attention has been paid to Arabic dependency parsing. In this study, we attempt to fill this gap and compare nine Arabic models, fine-tuning strategies, and encoding methods for dependency parsing. We evaluated three treebanks to highlight the best options and methods for fine-tuning Arabic BERT-based models to capture syntactic dependencies in the data. Our exploratory results show that the AraBERTv2 model provides the best scores for all treebanks and confirm that fine-tuning to the higher layers of pre-trained models is required. However, adding additional neural network layers to those models drops the accuracy. Additionally, we found that the treebanks have differences in the encoding techniques that give the highest scores. The analysis of the errors obtained by the test examples highlights four issues that have an important effect on the results: parse tree post-processing, contextualized embeddings, erroneous tokenization, and erroneous annotation. This study reveals a direction for future research to achieve enhanced Arabic BERT-based syntactic parsing.https://www.mdpi.com/2076-3417/13/7/4225syntactic parsingdependency parsingfine-tuning methodsmachine learningneural networksdeep learning
spellingShingle	Sharefah Al-Ghamdi Hend Al-Khalifa Abdulmalik Al-Salman Fine-Tuning BERT-Based Pre-Trained Models for Arabic Dependency Parsing Applied Sciences syntactic parsing dependency parsing fine-tuning methods machine learning neural networks deep learning
title	Fine-Tuning BERT-Based Pre-Trained Models for Arabic Dependency Parsing
title_full	Fine-Tuning BERT-Based Pre-Trained Models for Arabic Dependency Parsing
title_fullStr	Fine-Tuning BERT-Based Pre-Trained Models for Arabic Dependency Parsing
title_full_unstemmed	Fine-Tuning BERT-Based Pre-Trained Models for Arabic Dependency Parsing
title_short	Fine-Tuning BERT-Based Pre-Trained Models for Arabic Dependency Parsing
title_sort	fine tuning bert based pre trained models for arabic dependency parsing
topic	syntactic parsing dependency parsing fine-tuning methods machine learning neural networks deep learning
url	https://www.mdpi.com/2076-3417/13/7/4225
work_keys_str_mv	AT sharefahalghamdi finetuningbertbasedpretrainedmodelsforarabicdependencyparsing AT hendalkhalifa finetuningbertbasedpretrainedmodelsforarabicdependencyparsing AT abdulmalikalsalman finetuningbertbasedpretrainedmodelsforarabicdependencyparsing

Fine-Tuning BERT-Based Pre-Trained Models for Arabic Dependency Parsing

Similar Items