Classifying European Court of Human Rights Cases Using Transformer-Based Techniques

In the field of text classification, researchers have repeatedly shown the value of transformer-based models such as Bidirectional Encoder Representation from Transformers (BERT) and its variants. Nonetheless, these models are expensive in terms of memory and computational power but have not been ut...

Full description

Bibliographic Details
Main Authors:	Ali Shariq Imran, Henrik Hodnefjeld, Zenun Kastrati, Noureen Fatima, Sher Muhammad Daudpota, Mudasir Ahmad Wani
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Legal documents classification European court of human rights (ECHR) dataset natural language processing transformers BERT BigBird
Online Access:	https://ieeexplore.ieee.org/document/10130544/

_version_	1797808492553175040
author	Ali Shariq Imran Henrik Hodnefjeld Zenun Kastrati Noureen Fatima Sher Muhammad Daudpota Mudasir Ahmad Wani
author_facet	Ali Shariq Imran Henrik Hodnefjeld Zenun Kastrati Noureen Fatima Sher Muhammad Daudpota Mudasir Ahmad Wani
author_sort	Ali Shariq Imran
collection	DOAJ
description	In the field of text classification, researchers have repeatedly shown the value of transformer-based models such as Bidirectional Encoder Representation from Transformers (BERT) and its variants. Nonetheless, these models are expensive in terms of memory and computational power but have not been utilized to classify long documents of several domains. In addition, transformer models are also often pre-trained on generalized languages, making them less effective in language-specific domains, such as legal documents. In the natural language processing (NLP) domain, there is a growing interest in creating newer models that can handle more complex input sequences and domain-specific languages. Keeping the power of NLP in mind, this study proposes a legal documentation classifier that classifies the legal document by using the sliding window approach to increase the maximum sequence length of the model. We used the ECHR (European Court of Human Rights) publicly available dataset which to a large extent is imbalanced. Therefore, to balance the dataset we have scrapped the case articles from the web and extracted the data. Then, we employed conventional machine learning techniques such as SVM, DT, NB, AdaBoost, and transformer-based neural networks models including BERT, Legal-BERT, RoBERTa, BigBird, ELECTRA, and XLNet for the classification task. The experimental findings show that RoBERTa outperformed all the mentioned BERT versions by obtaining precision, recall, and F1-score of 89.1%, 86.2%, and 86.7%, respectively. While from conventional machine learning techniques, AdaBoost outclasses SVM, DT, and NB by achieving scores of 81.9%, 81.5%, and 81.7% for precision, recall, and F1-score, respectively.
first_indexed	2024-03-13T06:38:19Z
format	Article
id	doaj.art-a0c6b8e1ddd14376abe51245de4a1334
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-13T06:38:19Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-a0c6b8e1ddd14376abe51245de4a13342023-06-08T23:01:28ZengIEEEIEEE Access2169-35362023-01-0111556645567610.1109/ACCESS.2023.327903410130544Classifying European Court of Human Rights Cases Using Transformer-Based TechniquesAli Shariq Imran0https://orcid.org/0000-0002-2416-2878Henrik Hodnefjeld1Zenun Kastrati2https://orcid.org/0000-0002-0199-2377Noureen Fatima3https://orcid.org/0000-0001-7423-9346Sher Muhammad Daudpota4https://orcid.org/0000-0001-6684-751XMudasir Ahmad Wani5https://orcid.org/0000-0002-6947-3717Department of Computer Science, Norwegian University of Science and Technology (NTNU), Gjøvik, NorwayDepartment of Computer Science, Norwegian University of Science and Technology (NTNU), Gjøvik, NorwayDepartment of Informatics, Linnaeus University, Växjö, SwedenDepartment of Computer Science, Sukkur IBA University, Sukkur, PakistanDepartment of Computer Science, Sukkur IBA University, Sukkur, PakistanEIAS Data Science Laboratory, College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi ArabiaIn the field of text classification, researchers have repeatedly shown the value of transformer-based models such as Bidirectional Encoder Representation from Transformers (BERT) and its variants. Nonetheless, these models are expensive in terms of memory and computational power but have not been utilized to classify long documents of several domains. In addition, transformer models are also often pre-trained on generalized languages, making them less effective in language-specific domains, such as legal documents. In the natural language processing (NLP) domain, there is a growing interest in creating newer models that can handle more complex input sequences and domain-specific languages. Keeping the power of NLP in mind, this study proposes a legal documentation classifier that classifies the legal document by using the sliding window approach to increase the maximum sequence length of the model. We used the ECHR (European Court of Human Rights) publicly available dataset which to a large extent is imbalanced. Therefore, to balance the dataset we have scrapped the case articles from the web and extracted the data. Then, we employed conventional machine learning techniques such as SVM, DT, NB, AdaBoost, and transformer-based neural networks models including BERT, Legal-BERT, RoBERTa, BigBird, ELECTRA, and XLNet for the classification task. The experimental findings show that RoBERTa outperformed all the mentioned BERT versions by obtaining precision, recall, and F1-score of 89.1%, 86.2%, and 86.7%, respectively. While from conventional machine learning techniques, AdaBoost outclasses SVM, DT, and NB by achieving scores of 81.9%, 81.5%, and 81.7% for precision, recall, and F1-score, respectively.https://ieeexplore.ieee.org/document/10130544/Legal documents classificationEuropean court of human rights (ECHR) datasetnatural language processingtransformersBERTBigBird
spellingShingle	Ali Shariq Imran Henrik Hodnefjeld Zenun Kastrati Noureen Fatima Sher Muhammad Daudpota Mudasir Ahmad Wani Classifying European Court of Human Rights Cases Using Transformer-Based Techniques IEEE Access Legal documents classification European court of human rights (ECHR) dataset natural language processing transformers BERT BigBird
title	Classifying European Court of Human Rights Cases Using Transformer-Based Techniques
title_full	Classifying European Court of Human Rights Cases Using Transformer-Based Techniques
title_fullStr	Classifying European Court of Human Rights Cases Using Transformer-Based Techniques
title_full_unstemmed	Classifying European Court of Human Rights Cases Using Transformer-Based Techniques
title_short	Classifying European Court of Human Rights Cases Using Transformer-Based Techniques
title_sort	classifying european court of human rights cases using transformer based techniques
topic	Legal documents classification European court of human rights (ECHR) dataset natural language processing transformers BERT BigBird
url	https://ieeexplore.ieee.org/document/10130544/
work_keys_str_mv	AT alishariqimran classifyingeuropeancourtofhumanrightscasesusingtransformerbasedtechniques AT henrikhodnefjeld classifyingeuropeancourtofhumanrightscasesusingtransformerbasedtechniques AT zenunkastrati classifyingeuropeancourtofhumanrightscasesusingtransformerbasedtechniques AT noureenfatima classifyingeuropeancourtofhumanrightscasesusingtransformerbasedtechniques AT shermuhammaddaudpota classifyingeuropeancourtofhumanrightscasesusingtransformerbasedtechniques AT mudasirahmadwani classifyingeuropeancourtofhumanrightscasesusingtransformerbasedtechniques

Classifying European Court of Human Rights Cases Using Transformer-Based Techniques

Similar Items