Attentive embedding for document representation

With NLP reaching new and greater heights in many real-world applications, researchers are still trying to find better ways for a model to learn document representation. Moreover, most state-of-the-art NLP models have an encoder-decoder like architecture, which looks like an autoencoder architecture...

Full description

Bibliographic Details
Main Author: Tang, Kok Foon
Other Authors: Lihui CHEN
Format: Final Year Project (FYP)
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/149102
_version_ 1811688867487547392
author Tang, Kok Foon
author2 Lihui CHEN
author_facet Lihui CHEN
Tang, Kok Foon
author_sort Tang, Kok Foon
collection NTU
description With NLP reaching new and greater heights in many real-world applications, researchers are still trying to find better ways for a model to learn document representation. Moreover, most state-of-the-art NLP models have an encoder-decoder like architecture, which looks like an autoencoder architecture. Furthermore, KATE is an autoencoder that introduces a competition layer between the encoder and decoder. Hence, this project aims to use the KATE on a more complicated model to determine if KATE's usage provides a more attentive representation of documents. To investigate if KATE could improve document representation, training was implemented with 2 phases. The first phase trains the encoder-decoder models on a sentence reconstruction task, which enabled the model to learn the document representation. And the second phase, a classification task, can validate if the encoder and KATE from the first phase learned a good document representation via a classification task. The two models used for this test are 2-Layer LSTM and ALBERT. Both models were implemented and trained with and without KATE for comparison. The experiment results show that KATE helps in document representation for the 2-Layer LSTM but not for the ALBERT. Therefore, it concluded that KATE has the potential to help document representation for a simpler model like LSTM with minimal implementation cost, but not for a more complicated model like ALBERT.
first_indexed 2024-10-01T05:39:02Z
format Final Year Project (FYP)
id ntu-10356/149102
institution Nanyang Technological University
language English
last_indexed 2024-10-01T05:39:02Z
publishDate 2021
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1491022023-07-07T17:31:43Z Attentive embedding for document representation Tang, Kok Foon Lihui CHEN School of Electrical and Electronic Engineering Lihui Chen ELHCHEN@ntu.edu.sg Engineering::Electrical and electronic engineering With NLP reaching new and greater heights in many real-world applications, researchers are still trying to find better ways for a model to learn document representation. Moreover, most state-of-the-art NLP models have an encoder-decoder like architecture, which looks like an autoencoder architecture. Furthermore, KATE is an autoencoder that introduces a competition layer between the encoder and decoder. Hence, this project aims to use the KATE on a more complicated model to determine if KATE's usage provides a more attentive representation of documents. To investigate if KATE could improve document representation, training was implemented with 2 phases. The first phase trains the encoder-decoder models on a sentence reconstruction task, which enabled the model to learn the document representation. And the second phase, a classification task, can validate if the encoder and KATE from the first phase learned a good document representation via a classification task. The two models used for this test are 2-Layer LSTM and ALBERT. Both models were implemented and trained with and without KATE for comparison. The experiment results show that KATE helps in document representation for the 2-Layer LSTM but not for the ALBERT. Therefore, it concluded that KATE has the potential to help document representation for a simpler model like LSTM with minimal implementation cost, but not for a more complicated model like ALBERT. Bachelor of Engineering (Information Engineering and Media) 2021-05-26T13:03:03Z 2021-05-26T13:03:03Z 2021 Final Year Project (FYP) Tang, K. F. (2021). Attentive embedding for document representation. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/149102 https://hdl.handle.net/10356/149102 en A3046-201 application/pdf Nanyang Technological University
spellingShingle Engineering::Electrical and electronic engineering
Tang, Kok Foon
Attentive embedding for document representation
title Attentive embedding for document representation
title_full Attentive embedding for document representation
title_fullStr Attentive embedding for document representation
title_full_unstemmed Attentive embedding for document representation
title_short Attentive embedding for document representation
title_sort attentive embedding for document representation
topic Engineering::Electrical and electronic engineering
url https://hdl.handle.net/10356/149102
work_keys_str_mv AT tangkokfoon attentiveembeddingfordocumentrepresentation