Towards solving NLP tasks with optimal transport loss

Loss functions are essential to computing the divergence of a model’s predicted distribution from the ground truth. Such functions play a vital role in machine learning algorithms as they steer the learning process. Most common loss functions in natural language processing (NLP), such as Kullback–Le...

Full description

Bibliographic Details
Main Authors: Rishabh Bhardwaj, Tushar Vaidya, Soujanya Poria
Format: Article
Language:English
Published: Elsevier 2022-11-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1319157822003986
_version_ 1811197423043739648
author Rishabh Bhardwaj
Tushar Vaidya
Soujanya Poria
author_facet Rishabh Bhardwaj
Tushar Vaidya
Soujanya Poria
author_sort Rishabh Bhardwaj
collection DOAJ
description Loss functions are essential to computing the divergence of a model’s predicted distribution from the ground truth. Such functions play a vital role in machine learning algorithms as they steer the learning process. Most common loss functions in natural language processing (NLP), such as Kullback–Leibler (KL) and Jensen–Shannon (JS) divergences, do not base their computations on the properties of label coordinates. Label coordinates can help encode the inter-label relationships. For the sentiment classification task, strongly positive sentiment is closer to positive than strongly negative sentiment. Incorporating such information in the computations of the probability divergence can facilitate the model’s learning dynamics.In this work, we study an under-explored loss function in NLP — Wasserstein Optimal Transport (OT) — which takes label coordinates into account and thus allows the learning algorithm to incorporate inter-label relations. However, the limited applications of OT-based loss owe to the challenges in defining quality label coordinates. We explore the current limitations of learning with OT and provide an algorithm that jointly learns label coordinates with the model parameters. We show the efficacy of OT on several text classification tasks such as sentiment analysis and emotion recognition in conversation. We also discuss the limitations of the approach. The source codes pertaining to this work are publicly available at: https://github.com/declare-lab/NLP-OT.
first_indexed 2024-04-12T01:14:34Z
format Article
id doaj.art-8c3a095dcffb4ce884d6360fcf3952bf
institution Directory Open Access Journal
issn 1319-1578
language English
last_indexed 2024-04-12T01:14:34Z
publishDate 2022-11-01
publisher Elsevier
record_format Article
series Journal of King Saud University: Computer and Information Sciences
spelling doaj.art-8c3a095dcffb4ce884d6360fcf3952bf2022-12-22T03:54:01ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782022-11-0134101043410443Towards solving NLP tasks with optimal transport lossRishabh Bhardwaj0Tushar Vaidya1Soujanya Poria2Information Systems Technology and Design, Singapore University of Technology and Design, SingaporeInformation Systems Technology and Design, Singapore University of Technology and Design, SingaporeCorresponding author.; Information Systems Technology and Design, Singapore University of Technology and Design, SingaporeLoss functions are essential to computing the divergence of a model’s predicted distribution from the ground truth. Such functions play a vital role in machine learning algorithms as they steer the learning process. Most common loss functions in natural language processing (NLP), such as Kullback–Leibler (KL) and Jensen–Shannon (JS) divergences, do not base their computations on the properties of label coordinates. Label coordinates can help encode the inter-label relationships. For the sentiment classification task, strongly positive sentiment is closer to positive than strongly negative sentiment. Incorporating such information in the computations of the probability divergence can facilitate the model’s learning dynamics.In this work, we study an under-explored loss function in NLP — Wasserstein Optimal Transport (OT) — which takes label coordinates into account and thus allows the learning algorithm to incorporate inter-label relations. However, the limited applications of OT-based loss owe to the challenges in defining quality label coordinates. We explore the current limitations of learning with OT and provide an algorithm that jointly learns label coordinates with the model parameters. We show the efficacy of OT on several text classification tasks such as sentiment analysis and emotion recognition in conversation. We also discuss the limitations of the approach. The source codes pertaining to this work are publicly available at: https://github.com/declare-lab/NLP-OT.http://www.sciencedirect.com/science/article/pii/S1319157822003986NLPOptimal transportNeural nsetworks
spellingShingle Rishabh Bhardwaj
Tushar Vaidya
Soujanya Poria
Towards solving NLP tasks with optimal transport loss
Journal of King Saud University: Computer and Information Sciences
NLP
Optimal transport
Neural nsetworks
title Towards solving NLP tasks with optimal transport loss
title_full Towards solving NLP tasks with optimal transport loss
title_fullStr Towards solving NLP tasks with optimal transport loss
title_full_unstemmed Towards solving NLP tasks with optimal transport loss
title_short Towards solving NLP tasks with optimal transport loss
title_sort towards solving nlp tasks with optimal transport loss
topic NLP
Optimal transport
Neural nsetworks
url http://www.sciencedirect.com/science/article/pii/S1319157822003986
work_keys_str_mv AT rishabhbhardwaj towardssolvingnlptaskswithoptimaltransportloss
AT tusharvaidya towardssolvingnlptaskswithoptimaltransportloss
AT soujanyaporia towardssolvingnlptaskswithoptimaltransportloss