Topic modeling for conversations for mental health helplines with utterance embedding

Conversations with topics that are locally contextual often produces incoherent topic modeling results using standard methods. Splitting a conversation into its individual utterances makes it possible to avoid this problem. However, with increased data sparsity, different methods need to be consider...

Full description

Bibliographic Details
Main Authors:	Salim Salmi, Rob van der Mei, Saskia Mérelle, Sandjai Bhulai
Format:	Article
Language:	English
Published:	Elsevier 2024-03-01
Series:	Telematics and Informatics Reports
Subjects:	Topic modeling Sentence embedding Conversations Mental health Bert
Online Access:	http://www.sciencedirect.com/science/article/pii/S2772503024000124

_version_	1797260259345039360
author	Salim Salmi Rob van der Mei Saskia Mérelle Sandjai Bhulai
author_facet	Salim Salmi Rob van der Mei Saskia Mérelle Sandjai Bhulai
author_sort	Salim Salmi
collection	DOAJ
description	Conversations with topics that are locally contextual often produces incoherent topic modeling results using standard methods. Splitting a conversation into its individual utterances makes it possible to avoid this problem. However, with increased data sparsity, different methods need to be considered. Baseline bag-of-word topic modeling methods for regular and short-text, as well as topic modeling methods using transformer-based sentence embeddings were implemented. These models were evaluated on topic coherence and word embedding similarity. Each method was trained using single utterances, segments of the conversation, and on the full conversation. The results showed that utterance-level and segment-level data combined with sentence embedding methods performs better compared to other non-sentence embedding methods or conversation-level data. Among the sentence embedding methods, clustering using HDBScan showed the best performance. We suspect that ignoring noisy utterances is the reason for better topic coherence and a relatively large improvement in topic word similarity.
first_indexed	2024-03-07T14:27:48Z
format	Article
id	doaj.art-28a8b1272d1148d9931ae850cf9461d4
institution	Directory Open Access Journal
issn	2772-5030
language	English
last_indexed	2024-04-24T23:22:29Z
publishDate	2024-03-01
publisher	Elsevier
record_format	Article
series	Telematics and Informatics Reports
spelling	doaj.art-28a8b1272d1148d9931ae850cf9461d42024-03-16T05:10:01ZengElsevierTelematics and Informatics Reports2772-50302024-03-0113100126Topic modeling for conversations for mental health helplines with utterance embeddingSalim Salmi0Rob van der Mei1Saskia Mérelle2Sandjai Bhulai3Centrum Wiskunde & Informatica, Netherlands; Correspondence to: P.O. Box 94079, 1090 GB Amsterdam, Netherlands.Centrum Wiskunde & Informatica, Netherlands113 Suicide Prevention, NetherlandsVrije Universiteit Amsterdam, NetherlandsConversations with topics that are locally contextual often produces incoherent topic modeling results using standard methods. Splitting a conversation into its individual utterances makes it possible to avoid this problem. However, with increased data sparsity, different methods need to be considered. Baseline bag-of-word topic modeling methods for regular and short-text, as well as topic modeling methods using transformer-based sentence embeddings were implemented. These models were evaluated on topic coherence and word embedding similarity. Each method was trained using single utterances, segments of the conversation, and on the full conversation. The results showed that utterance-level and segment-level data combined with sentence embedding methods performs better compared to other non-sentence embedding methods or conversation-level data. Among the sentence embedding methods, clustering using HDBScan showed the best performance. We suspect that ignoring noisy utterances is the reason for better topic coherence and a relatively large improvement in topic word similarity.http://www.sciencedirect.com/science/article/pii/S2772503024000124Topic modelingSentence embeddingConversationsMental healthBert
spellingShingle	Salim Salmi Rob van der Mei Saskia Mérelle Sandjai Bhulai Topic modeling for conversations for mental health helplines with utterance embedding Telematics and Informatics Reports Topic modeling Sentence embedding Conversations Mental health Bert
title	Topic modeling for conversations for mental health helplines with utterance embedding
title_full	Topic modeling for conversations for mental health helplines with utterance embedding
title_fullStr	Topic modeling for conversations for mental health helplines with utterance embedding
title_full_unstemmed	Topic modeling for conversations for mental health helplines with utterance embedding
title_short	Topic modeling for conversations for mental health helplines with utterance embedding
title_sort	topic modeling for conversations for mental health helplines with utterance embedding
topic	Topic modeling Sentence embedding Conversations Mental health Bert
url	http://www.sciencedirect.com/science/article/pii/S2772503024000124
work_keys_str_mv	AT salimsalmi topicmodelingforconversationsformentalhealthhelplineswithutteranceembedding AT robvandermei topicmodelingforconversationsformentalhealthhelplineswithutteranceembedding AT saskiamerelle topicmodelingforconversationsformentalhealthhelplineswithutteranceembedding AT sandjaibhulai topicmodelingforconversationsformentalhealthhelplineswithutteranceembedding

Topic modeling for conversations for mental health helplines with utterance embedding

Similar Items