An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade
Classification problems are common activities in many different domains and supervised learning algorithms have shown great promise in these areas. The classification of goods in international trade in Brazil represents a real challenge due to the complexity involved in assigning the correct categor...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-01-01
|
Series: | Big Data and Cognitive Computing |
Subjects: | |
Online Access: | https://www.mdpi.com/2504-2289/6/1/8 |
_version_ | 1797447135143133184 |
---|---|
author | Roberta Rodrigues de Lima Anita M. R. Fernandes James Roberto Bombasar Bruno Alves da Silva Paul Crocker Valderi Reis Quietinho Leithardt |
author_facet | Roberta Rodrigues de Lima Anita M. R. Fernandes James Roberto Bombasar Bruno Alves da Silva Paul Crocker Valderi Reis Quietinho Leithardt |
author_sort | Roberta Rodrigues de Lima |
collection | DOAJ |
description | Classification problems are common activities in many different domains and supervised learning algorithms have shown great promise in these areas. The classification of goods in international trade in Brazil represents a real challenge due to the complexity involved in assigning the correct category codes to a good, especially considering the tax penalties and legal implications of a misclassification. This work focuses on the training process of a classifier based on bidirectional encoder representations from transformers (BERT) for tax classification of goods with MCN codes which are the official classification system for import and export products in Brazil. In particular, this article presents results from using a specific Portuguese-language-pretrained BERT model, as well as results from using a multilingual-pretrained BERT model. Experimental results show that Portuguese model had a slightly better performance than the multilingual model, achieving an MCC 0.8491, and confirms that the classifiers could be used to improve specialists’ performance in the classification of goods. |
first_indexed | 2024-03-09T13:51:28Z |
format | Article |
id | doaj.art-ca2538a0973d43b89c89d1ba184dcf01 |
institution | Directory Open Access Journal |
issn | 2504-2289 |
language | English |
last_indexed | 2024-03-09T13:51:28Z |
publishDate | 2022-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Big Data and Cognitive Computing |
spelling | doaj.art-ca2538a0973d43b89c89d1ba184dcf012023-11-30T20:50:45ZengMDPI AGBig Data and Cognitive Computing2504-22892022-01-0161810.3390/bdcc6010008An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International TradeRoberta Rodrigues de Lima0Anita M. R. Fernandes1James Roberto Bombasar2Bruno Alves da Silva3Paul Crocker4Valderi Reis Quietinho Leithardt5Laboratory of Applied Intelligence, School of the Sea Science and Technology-University of Vale do Itajaí, Itajaí 88302-901, BrazilLaboratory of Applied Intelligence, School of the Sea Science and Technology-University of Vale do Itajaí, Itajaí 88302-901, BrazilAnalysis and Systems Development Course, Centro Universitário Avantis, Balneário Camboriú 88339-125, BrazilLaboratory of Applied Intelligence, School of the Sea Science and Technology-University of Vale do Itajaí, Itajaí 88302-901, BrazilInstituto de Telecomunicações and Departamento de Informática, Universidade da Beira Interior, 6201-001 Covilhã, PortugalCOPELABS, Lusófona University of Humanities and Technologies, Campo Grande 376, 1749-024 Lisboa, PortugalClassification problems are common activities in many different domains and supervised learning algorithms have shown great promise in these areas. The classification of goods in international trade in Brazil represents a real challenge due to the complexity involved in assigning the correct category codes to a good, especially considering the tax penalties and legal implications of a misclassification. This work focuses on the training process of a classifier based on bidirectional encoder representations from transformers (BERT) for tax classification of goods with MCN codes which are the official classification system for import and export products in Brazil. In particular, this article presents results from using a specific Portuguese-language-pretrained BERT model, as well as results from using a multilingual-pretrained BERT model. Experimental results show that Portuguese model had a slightly better performance than the multilingual model, achieving an MCC 0.8491, and confirms that the classifiers could be used to improve specialists’ performance in the classification of goods.https://www.mdpi.com/2504-2289/6/1/8NCM classificationnatural language processingmultilingual BERTPortuguese BERTtransformersNLP |
spellingShingle | Roberta Rodrigues de Lima Anita M. R. Fernandes James Roberto Bombasar Bruno Alves da Silva Paul Crocker Valderi Reis Quietinho Leithardt An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade Big Data and Cognitive Computing NCM classification natural language processing multilingual BERT Portuguese BERT transformers NLP |
title | An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade |
title_full | An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade |
title_fullStr | An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade |
title_full_unstemmed | An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade |
title_short | An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade |
title_sort | empirical comparison of portuguese and multilingual bert models for auto classification of ncm codes in international trade |
topic | NCM classification natural language processing multilingual BERT Portuguese BERT transformers NLP |
url | https://www.mdpi.com/2504-2289/6/1/8 |
work_keys_str_mv | AT robertarodriguesdelima anempiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade AT anitamrfernandes anempiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade AT jamesrobertobombasar anempiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade AT brunoalvesdasilva anempiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade AT paulcrocker anempiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade AT valderireisquietinholeithardt anempiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade AT robertarodriguesdelima empiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade AT anitamrfernandes empiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade AT jamesrobertobombasar empiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade AT brunoalvesdasilva empiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade AT paulcrocker empiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade AT valderireisquietinholeithardt empiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade |