Towards solving NLP tasks with optimal transport loss

Loss functions are essential to computing the divergence of a model’s predicted distribution from the ground truth. Such functions play a vital role in machine learning algorithms as they steer the learning process. Most common loss functions in natural language processing (NLP), such as Kullback–Le...

Full description

Bibliographic Details
Main Authors:	Rishabh Bhardwaj, Tushar Vaidya, Soujanya Poria
Format:	Article
Language:	English
Published:	Elsevier 2022-11-01
Series:	Journal of King Saud University: Computer and Information Sciences
Subjects:	NLP Optimal transport Neural nsetworks
Online Access:	http://www.sciencedirect.com/science/article/pii/S1319157822003986

_version_	1828163706387693568
author	Rishabh Bhardwaj Tushar Vaidya Soujanya Poria
author_facet	Rishabh Bhardwaj Tushar Vaidya Soujanya Poria
author_sort	Rishabh Bhardwaj
collection	DOAJ
description	Loss functions are essential to computing the divergence of a model’s predicted distribution from the ground truth. Such functions play a vital role in machine learning algorithms as they steer the learning process. Most common loss functions in natural language processing (NLP), such as Kullback–Leibler (KL) and Jensen–Shannon (JS) divergences, do not base their computations on the properties of label coordinates. Label coordinates can help encode the inter-label relationships. For the sentiment classification task, strongly positive sentiment is closer to positive than strongly negative sentiment. Incorporating such information in the computations of the probability divergence can facilitate the model’s learning dynamics.In this work, we study an under-explored loss function in NLP — Wasserstein Optimal Transport (OT) — which takes label coordinates into account and thus allows the learning algorithm to incorporate inter-label relations. However, the limited applications of OT-based loss owe to the challenges in defining quality label coordinates. We explore the current limitations of learning with OT and provide an algorithm that jointly learns label coordinates with the model parameters. We show the efficacy of OT on several text classification tasks such as sentiment analysis and emotion recognition in conversation. We also discuss the limitations of the approach. The source codes pertaining to this work are publicly available at: https://github.com/declare-lab/NLP-OT.
first_indexed	2024-04-12T01:14:34Z
format	Article
id	doaj.art-8c3a095dcffb4ce884d6360fcf3952bf
institution	Directory Open Access Journal
issn	1319-1578
language	English
last_indexed	2024-04-12T01:14:34Z
publishDate	2022-11-01
publisher	Elsevier
record_format	Article
series	Journal of King Saud University: Computer and Information Sciences
spelling	doaj.art-8c3a095dcffb4ce884d6360fcf3952bf2022-12-22T03:54:01ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782022-11-0134101043410443Towards solving NLP tasks with optimal transport lossRishabh Bhardwaj0Tushar Vaidya1Soujanya Poria2Information Systems Technology and Design, Singapore University of Technology and Design, SingaporeInformation Systems Technology and Design, Singapore University of Technology and Design, SingaporeCorresponding author.; Information Systems Technology and Design, Singapore University of Technology and Design, SingaporeLoss functions are essential to computing the divergence of a model’s predicted distribution from the ground truth. Such functions play a vital role in machine learning algorithms as they steer the learning process. Most common loss functions in natural language processing (NLP), such as Kullback–Leibler (KL) and Jensen–Shannon (JS) divergences, do not base their computations on the properties of label coordinates. Label coordinates can help encode the inter-label relationships. For the sentiment classification task, strongly positive sentiment is closer to positive than strongly negative sentiment. Incorporating such information in the computations of the probability divergence can facilitate the model’s learning dynamics.In this work, we study an under-explored loss function in NLP — Wasserstein Optimal Transport (OT) — which takes label coordinates into account and thus allows the learning algorithm to incorporate inter-label relations. However, the limited applications of OT-based loss owe to the challenges in defining quality label coordinates. We explore the current limitations of learning with OT and provide an algorithm that jointly learns label coordinates with the model parameters. We show the efficacy of OT on several text classification tasks such as sentiment analysis and emotion recognition in conversation. We also discuss the limitations of the approach. The source codes pertaining to this work are publicly available at: https://github.com/declare-lab/NLP-OT.http://www.sciencedirect.com/science/article/pii/S1319157822003986NLPOptimal transportNeural nsetworks
spellingShingle	Rishabh Bhardwaj Tushar Vaidya Soujanya Poria Towards solving NLP tasks with optimal transport loss Journal of King Saud University: Computer and Information Sciences NLP Optimal transport Neural nsetworks
title	Towards solving NLP tasks with optimal transport loss
title_full	Towards solving NLP tasks with optimal transport loss
title_fullStr	Towards solving NLP tasks with optimal transport loss
title_full_unstemmed	Towards solving NLP tasks with optimal transport loss
title_short	Towards solving NLP tasks with optimal transport loss
title_sort	towards solving nlp tasks with optimal transport loss
topic	NLP Optimal transport Neural nsetworks
url	http://www.sciencedirect.com/science/article/pii/S1319157822003986
work_keys_str_mv	AT rishabhbhardwaj towardssolvingnlptaskswithoptimaltransportloss AT tusharvaidya towardssolvingnlptaskswithoptimaltransportloss AT soujanyaporia towardssolvingnlptaskswithoptimaltransportloss

Towards solving NLP tasks with optimal transport loss

Similar Items