Learning distributed sentence representations for story segmentation

Traditional sentence representations such as bag-of-words (BOW) and term frequency-inverse document frequency (tf-idf) face the problem of data sparsity and may not generalize well. Neural network based representations such as word/sentence vectors are usually trained in an unsupervised way and lack...

Full description

Bibliographic Details
Main Authors:	Yu, Jia, Xie, Lei, Xiao, Xiong, Chng, Eng Siong
Format:	Journal Article
Language:	English
Published:	2020
Subjects:	Engineering::Computer science and engineering Distributed Representation Deep Neural Network
Online Access:	https://hdl.handle.net/10356/141962

_version_	1811692757251522560
author	Yu, Jia Xie, Lei Xiao, Xiong Chng, Eng Siong
author_facet	Yu, Jia Xie, Lei Xiao, Xiong Chng, Eng Siong
author_sort	Yu, Jia
collection	NTU
description	Traditional sentence representations such as bag-of-words (BOW) and term frequency-inverse document frequency (tf-idf) face the problem of data sparsity and may not generalize well. Neural network based representations such as word/sentence vectors are usually trained in an unsupervised way and lack the topic information which is important for story segmentation. In this paper, we propose to learn sentence representation by using deep neural network (DNN) to directly predict the topic class of the input sentence. By using supervised training, the learned vector representation of sentences contains more topic information and is more suitable for the story segmentation task. The input of the DNN is BOW vector computed from a context window. Multiple time resolution BOW and bottleneck features (BNF) are also introduced to enhance the performance of story segmentation. As text data labeled with topic information is limited, we cluster stories into classes and use the class ID as the topic label of the stories for DNN training. We evaluated the proposed sentence representation with the TextTiling and normalized cuts (NCuts) based story segmentation methods on the topic detection and tracking (TDT2) task. Experimental results show that the proposed topical sentence representation outperforms both the BOW baseline and the recently proposed neural network based representations, i.e., word and sentence vectors.
first_indexed	2024-10-01T06:40:51Z
format	Journal Article
id	ntu-10356/141962
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T06:40:51Z
publishDate	2020
record_format	dspace
spelling	ntu-10356/1419622020-06-12T04:42:01Z Learning distributed sentence representations for story segmentation Yu, Jia Xie, Lei Xiao, Xiong Chng, Eng Siong Engineering::Computer science and engineering Distributed Representation Deep Neural Network Traditional sentence representations such as bag-of-words (BOW) and term frequency-inverse document frequency (tf-idf) face the problem of data sparsity and may not generalize well. Neural network based representations such as word/sentence vectors are usually trained in an unsupervised way and lack the topic information which is important for story segmentation. In this paper, we propose to learn sentence representation by using deep neural network (DNN) to directly predict the topic class of the input sentence. By using supervised training, the learned vector representation of sentences contains more topic information and is more suitable for the story segmentation task. The input of the DNN is BOW vector computed from a context window. Multiple time resolution BOW and bottleneck features (BNF) are also introduced to enhance the performance of story segmentation. As text data labeled with topic information is limited, we cluster stories into classes and use the class ID as the topic label of the stories for DNN training. We evaluated the proposed sentence representation with the TextTiling and normalized cuts (NCuts) based story segmentation methods on the topic detection and tracking (TDT2) task. Experimental results show that the proposed topical sentence representation outperforms both the BOW baseline and the recently proposed neural network based representations, i.e., word and sentence vectors. 2020-06-12T04:42:01Z 2020-06-12T04:42:01Z 2018 Journal Article Yu, J., Xie, L., Xiao, X., & Chng, E. S. (2018). Learning distributed sentence representations for story segmentation. Signal Processing, 142, 403-411. doi:10.1016/j.sigpro.2017.07.026 0165-1684 https://hdl.handle.net/10356/141962 10.1016/j.sigpro.2017.07.026 2-s2.0-85026886515 142 403 411 en Signal Processing © 2017 Elsevier B.V. All rights reserved.
spellingShingle	Engineering::Computer science and engineering Distributed Representation Deep Neural Network Yu, Jia Xie, Lei Xiao, Xiong Chng, Eng Siong Learning distributed sentence representations for story segmentation
title	Learning distributed sentence representations for story segmentation
title_full	Learning distributed sentence representations for story segmentation
title_fullStr	Learning distributed sentence representations for story segmentation
title_full_unstemmed	Learning distributed sentence representations for story segmentation
title_short	Learning distributed sentence representations for story segmentation
title_sort	learning distributed sentence representations for story segmentation
topic	Engineering::Computer science and engineering Distributed Representation Deep Neural Network
url	https://hdl.handle.net/10356/141962
work_keys_str_mv	AT yujia learningdistributedsentencerepresentationsforstorysegmentation AT xielei learningdistributedsentencerepresentationsforstorysegmentation AT xiaoxiong learningdistributedsentencerepresentationsforstorysegmentation AT chngengsiong learningdistributedsentencerepresentationsforstorysegmentation

Learning distributed sentence representations for story segmentation

Similar Items