An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade

Classification problems are common activities in many different domains and supervised learning algorithms have shown great promise in these areas. The classification of goods in international trade in Brazil represents a real challenge due to the complexity involved in assigning the correct categor...

Full description

Bibliographic Details
Main Authors: Roberta Rodrigues de Lima, Anita M. R. Fernandes, James Roberto Bombasar, Bruno Alves da Silva, Paul Crocker, Valderi Reis Quietinho Leithardt
Format: Article
Language:English
Published: MDPI AG 2022-01-01
Series:Big Data and Cognitive Computing
Subjects:
Online Access:https://www.mdpi.com/2504-2289/6/1/8
_version_ 1797447135143133184
author Roberta Rodrigues de Lima
Anita M. R. Fernandes
James Roberto Bombasar
Bruno Alves da Silva
Paul Crocker
Valderi Reis Quietinho Leithardt
author_facet Roberta Rodrigues de Lima
Anita M. R. Fernandes
James Roberto Bombasar
Bruno Alves da Silva
Paul Crocker
Valderi Reis Quietinho Leithardt
author_sort Roberta Rodrigues de Lima
collection DOAJ
description Classification problems are common activities in many different domains and supervised learning algorithms have shown great promise in these areas. The classification of goods in international trade in Brazil represents a real challenge due to the complexity involved in assigning the correct category codes to a good, especially considering the tax penalties and legal implications of a misclassification. This work focuses on the training process of a classifier based on bidirectional encoder representations from transformers (BERT) for tax classification of goods with MCN codes which are the official classification system for import and export products in Brazil. In particular, this article presents results from using a specific Portuguese-language-pretrained BERT model, as well as results from using a multilingual-pretrained BERT model. Experimental results show that Portuguese model had a slightly better performance than the multilingual model, achieving an MCC 0.8491, and confirms that the classifiers could be used to improve specialists’ performance in the classification of goods.
first_indexed 2024-03-09T13:51:28Z
format Article
id doaj.art-ca2538a0973d43b89c89d1ba184dcf01
institution Directory Open Access Journal
issn 2504-2289
language English
last_indexed 2024-03-09T13:51:28Z
publishDate 2022-01-01
publisher MDPI AG
record_format Article
series Big Data and Cognitive Computing
spelling doaj.art-ca2538a0973d43b89c89d1ba184dcf012023-11-30T20:50:45ZengMDPI AGBig Data and Cognitive Computing2504-22892022-01-0161810.3390/bdcc6010008An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International TradeRoberta Rodrigues de Lima0Anita M. R. Fernandes1James Roberto Bombasar2Bruno Alves da Silva3Paul Crocker4Valderi Reis Quietinho Leithardt5Laboratory of Applied Intelligence, School of the Sea Science and Technology-University of Vale do Itajaí, Itajaí 88302-901, BrazilLaboratory of Applied Intelligence, School of the Sea Science and Technology-University of Vale do Itajaí, Itajaí 88302-901, BrazilAnalysis and Systems Development Course, Centro Universitário Avantis, Balneário Camboriú 88339-125, BrazilLaboratory of Applied Intelligence, School of the Sea Science and Technology-University of Vale do Itajaí, Itajaí 88302-901, BrazilInstituto de Telecomunicações and Departamento de Informática, Universidade da Beira Interior, 6201-001 Covilhã, PortugalCOPELABS, Lusófona University of Humanities and Technologies, Campo Grande 376, 1749-024 Lisboa, PortugalClassification problems are common activities in many different domains and supervised learning algorithms have shown great promise in these areas. The classification of goods in international trade in Brazil represents a real challenge due to the complexity involved in assigning the correct category codes to a good, especially considering the tax penalties and legal implications of a misclassification. This work focuses on the training process of a classifier based on bidirectional encoder representations from transformers (BERT) for tax classification of goods with MCN codes which are the official classification system for import and export products in Brazil. In particular, this article presents results from using a specific Portuguese-language-pretrained BERT model, as well as results from using a multilingual-pretrained BERT model. Experimental results show that Portuguese model had a slightly better performance than the multilingual model, achieving an MCC 0.8491, and confirms that the classifiers could be used to improve specialists’ performance in the classification of goods.https://www.mdpi.com/2504-2289/6/1/8NCM classificationnatural language processingmultilingual BERTPortuguese BERTtransformersNLP
spellingShingle Roberta Rodrigues de Lima
Anita M. R. Fernandes
James Roberto Bombasar
Bruno Alves da Silva
Paul Crocker
Valderi Reis Quietinho Leithardt
An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade
Big Data and Cognitive Computing
NCM classification
natural language processing
multilingual BERT
Portuguese BERT
transformers
NLP
title An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade
title_full An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade
title_fullStr An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade
title_full_unstemmed An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade
title_short An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade
title_sort empirical comparison of portuguese and multilingual bert models for auto classification of ncm codes in international trade
topic NCM classification
natural language processing
multilingual BERT
Portuguese BERT
transformers
NLP
url https://www.mdpi.com/2504-2289/6/1/8
work_keys_str_mv AT robertarodriguesdelima anempiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade
AT anitamrfernandes anempiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade
AT jamesrobertobombasar anempiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade
AT brunoalvesdasilva anempiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade
AT paulcrocker anempiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade
AT valderireisquietinholeithardt anempiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade
AT robertarodriguesdelima empiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade
AT anitamrfernandes empiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade
AT jamesrobertobombasar empiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade
AT brunoalvesdasilva empiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade
AT paulcrocker empiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade
AT valderireisquietinholeithardt empiricalcomparisonofportugueseandmultilingualbertmodelsforautoclassificationofncmcodesininternationaltrade