Topical analysis of text streams

Topic detection (TD) is an important area of research whose primary goal is to detect retrospective or new topics from a stream of news articles. It could be extremely useful in many applications including news aggregation portals, news alert systems, event search engine, terrorist activity tracking...

Full description

Bibliographic Details
Main Author: He, Qi
Other Authors: Lim Ee Peng
Format: Thesis
Language:English
Published: 2009
Subjects:
Online Access:https://hdl.handle.net/10356/17764
_version_ 1811682769826217984
author He, Qi
author2 Lim Ee Peng
author_facet Lim Ee Peng
He, Qi
author_sort He, Qi
collection NTU
description Topic detection (TD) is an important area of research whose primary goal is to detect retrospective or new topics from a stream of news articles. It could be extremely useful in many applications including news aggregation portals, news alert systems, event search engine, terrorist activity tracking, etc. However, specialists who analyze news articles have a hard time separating the wheat from the chaff, due to the overwhelming amount of news streams (over 10,000 as of 2008). For many years, Topic Detection has been tackled as a clustering task by the TDT (Topic Detection and Tracking) research community. However, time, which plays a pivotal role in news articles has never been given due consideration in the past. In this research we present a thorough study on various temporal topic detection models that explicitly incorporate the element of time. We further discovered that bursty temporal word features play an important role in improving topic detection performance, and ventured to provide an in-depth analysis and systematic categorization of all word features into 5 general types using techniques from signal processing. Armed with a small set of extracted bursty features from historical or online news streams, we proposed a number of effective algorithms to detect topics from a news stream in both offline and online modes. Our algorithms are mathematically elegant, simple, and extremely practical, when benchmarked against some of the best topic detection models including spherical k-means, Latent Dirichlet Allocation (LDA), and von-Mises Fisher mixtures. Finally, we present a case study of a personalized news alert application, where subscribers can specify interesting anticipatory events, and show how a simple supervised event transition classifier can be used to effectively identify user anticipated events. Our research is one of the most comprehensive studies on both offline and online topic detection, of which the latter has been an open research problem for many years. In fact, our online topic detection model can be viewed as a significant advancement in the field, which paves the way for further improvements by other TDT experts.
first_indexed 2024-10-01T04:02:07Z
format Thesis
id ntu-10356/17764
institution Nanyang Technological University
language English
last_indexed 2024-10-01T04:02:07Z
publishDate 2009
record_format dspace
spelling ntu-10356/177642023-03-04T00:42:40Z Topical analysis of text streams He, Qi Lim Ee Peng Chang Kuiyu School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing Topic detection (TD) is an important area of research whose primary goal is to detect retrospective or new topics from a stream of news articles. It could be extremely useful in many applications including news aggregation portals, news alert systems, event search engine, terrorist activity tracking, etc. However, specialists who analyze news articles have a hard time separating the wheat from the chaff, due to the overwhelming amount of news streams (over 10,000 as of 2008). For many years, Topic Detection has been tackled as a clustering task by the TDT (Topic Detection and Tracking) research community. However, time, which plays a pivotal role in news articles has never been given due consideration in the past. In this research we present a thorough study on various temporal topic detection models that explicitly incorporate the element of time. We further discovered that bursty temporal word features play an important role in improving topic detection performance, and ventured to provide an in-depth analysis and systematic categorization of all word features into 5 general types using techniques from signal processing. Armed with a small set of extracted bursty features from historical or online news streams, we proposed a number of effective algorithms to detect topics from a news stream in both offline and online modes. Our algorithms are mathematically elegant, simple, and extremely practical, when benchmarked against some of the best topic detection models including spherical k-means, Latent Dirichlet Allocation (LDA), and von-Mises Fisher mixtures. Finally, we present a case study of a personalized news alert application, where subscribers can specify interesting anticipatory events, and show how a simple supervised event transition classifier can be used to effectively identify user anticipated events. Our research is one of the most comprehensive studies on both offline and online topic detection, of which the latter has been an open research problem for many years. In fact, our online topic detection model can be viewed as a significant advancement in the field, which paves the way for further improvements by other TDT experts. DOCTOR OF PHILOSOPHY (SCE) 2009-06-15T01:27:56Z 2009-06-15T01:27:56Z 2009 2009 Thesis He, Q. (2009). Topical analysis of text streams. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/17764 10.32657/10356/17764 en 200 p. application/pdf
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
He, Qi
Topical analysis of text streams
title Topical analysis of text streams
title_full Topical analysis of text streams
title_fullStr Topical analysis of text streams
title_full_unstemmed Topical analysis of text streams
title_short Topical analysis of text streams
title_sort topical analysis of text streams
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
url https://hdl.handle.net/10356/17764
work_keys_str_mv AT heqi topicalanalysisoftextstreams