Transient chaos in bidirectional encoder representations from transformers

Language is an outcome of our complex and dynamic human-interactions and the technique of natural language processing (NLP) is hence built on human linguistic activities. Along with generative pretrained transformer (GPT), bidirectional encoder representations from transformers (bert) has recently g...

Full description

Bibliographic Details
Main Authors:	Katsuma Inoue, Soh Ohara, Yasuo Kuniyoshi, Kohei Nakajima
Format:	Article
Language:	English
Published:	American Physical Society 2022-03-01
Series:	Physical Review Research
Online Access:	http://doi.org/10.1103/PhysRevResearch.4.013204

_version_	1797210789476564992
author	Katsuma Inoue Soh Ohara Yasuo Kuniyoshi Kohei Nakajima
author_facet	Katsuma Inoue Soh Ohara Yasuo Kuniyoshi Kohei Nakajima
author_sort	Katsuma Inoue
collection	DOAJ
description	Language is an outcome of our complex and dynamic human-interactions and the technique of natural language processing (NLP) is hence built on human linguistic activities. Along with generative pretrained transformer (GPT), bidirectional encoder representations from transformers (bert) has recently gained its popularity, owing to its outstanding NLP capabilities, by establishing the state-of-the-art scores in several NLP benchmarks. A lite bert (albert) is literally characterized as a lightweight version of bert, in which the number of bert parameters is reduced by repeatedly applying the same neural network called transformer's encoder layer. By pretraining the parameters with a massive amount of natural language data, albert can convert input sentences into versatile high-dimensional vectors potentially capable of solving multiple NLP tasks. In that sense, albert can be regarded as a well-designed high-dimensional dynamical system whose operator is the transformer's encoder, and essential structures of human language are thus expected to be encapsulated in its dynamics. In this study, we investigated the embedded properties of pretrained albert to reveal how NLP tasks are effectively solved by exploiting its dynamics. We thereby aimed to explore the nature of human language from the dynamical expressions of the NLP model. Our analysis consists of two parts, namely, short- and long-term analyses, according to timescale differences to capture the dynamics. Our short-term analysis clarified that the pretrained model stably yields trajectories with higher dimensionality in a certain time range, which would enhance the expressive capacity required for NLP tasks. Also, our long-term analysis revealed that albert intrinsically shows transient chaos, a typical nonlinear phenomenon showing chaotic dynamics only in its transient, and the pretrained albert model tends to produce the chaotic trajectory for a significantly longer time period compared to a randomly initialized one. Our results imply that local chaoticity would contribute to improving NLP performance, uncovering a novel aspect in the role of chaotic dynamics in human language behaviors.
first_indexed	2024-04-24T10:16:11Z
format	Article
id	doaj.art-5e72814901ef4af2acb13c1cf0b327c9
institution	Directory Open Access Journal
issn	2643-1564
language	English
last_indexed	2024-04-24T10:16:11Z
publishDate	2022-03-01
publisher	American Physical Society
record_format	Article
series	Physical Review Research
spelling	doaj.art-5e72814901ef4af2acb13c1cf0b327c92024-04-12T17:18:58ZengAmerican Physical SocietyPhysical Review Research2643-15642022-03-014101320410.1103/PhysRevResearch.4.013204Transient chaos in bidirectional encoder representations from transformersKatsuma InoueSoh OharaYasuo KuniyoshiKohei NakajimaLanguage is an outcome of our complex and dynamic human-interactions and the technique of natural language processing (NLP) is hence built on human linguistic activities. Along with generative pretrained transformer (GPT), bidirectional encoder representations from transformers (bert) has recently gained its popularity, owing to its outstanding NLP capabilities, by establishing the state-of-the-art scores in several NLP benchmarks. A lite bert (albert) is literally characterized as a lightweight version of bert, in which the number of bert parameters is reduced by repeatedly applying the same neural network called transformer's encoder layer. By pretraining the parameters with a massive amount of natural language data, albert can convert input sentences into versatile high-dimensional vectors potentially capable of solving multiple NLP tasks. In that sense, albert can be regarded as a well-designed high-dimensional dynamical system whose operator is the transformer's encoder, and essential structures of human language are thus expected to be encapsulated in its dynamics. In this study, we investigated the embedded properties of pretrained albert to reveal how NLP tasks are effectively solved by exploiting its dynamics. We thereby aimed to explore the nature of human language from the dynamical expressions of the NLP model. Our analysis consists of two parts, namely, short- and long-term analyses, according to timescale differences to capture the dynamics. Our short-term analysis clarified that the pretrained model stably yields trajectories with higher dimensionality in a certain time range, which would enhance the expressive capacity required for NLP tasks. Also, our long-term analysis revealed that albert intrinsically shows transient chaos, a typical nonlinear phenomenon showing chaotic dynamics only in its transient, and the pretrained albert model tends to produce the chaotic trajectory for a significantly longer time period compared to a randomly initialized one. Our results imply that local chaoticity would contribute to improving NLP performance, uncovering a novel aspect in the role of chaotic dynamics in human language behaviors.http://doi.org/10.1103/PhysRevResearch.4.013204
spellingShingle	Katsuma Inoue Soh Ohara Yasuo Kuniyoshi Kohei Nakajima Transient chaos in bidirectional encoder representations from transformers Physical Review Research
title	Transient chaos in bidirectional encoder representations from transformers
title_full	Transient chaos in bidirectional encoder representations from transformers
title_fullStr	Transient chaos in bidirectional encoder representations from transformers
title_full_unstemmed	Transient chaos in bidirectional encoder representations from transformers
title_short	Transient chaos in bidirectional encoder representations from transformers
title_sort	transient chaos in bidirectional encoder representations from transformers
url	http://doi.org/10.1103/PhysRevResearch.4.013204
work_keys_str_mv	AT katsumainoue transientchaosinbidirectionalencoderrepresentationsfromtransformers AT sohohara transientchaosinbidirectionalencoderrepresentationsfromtransformers AT yasuokuniyoshi transientchaosinbidirectionalencoderrepresentationsfromtransformers AT koheinakajima transientchaosinbidirectionalencoderrepresentationsfromtransformers

Transient chaos in bidirectional encoder representations from transformers

Similar Items