Topical analysis of text streams
Topic detection (TD) is an important area of research whose primary goal is to detect retrospective or new topics from a stream of news articles. It could be extremely useful in many applications including news aggregation portals, news alert systems, event search engine, terrorist activity tracking...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | English |
Published: |
2009
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/17764 |
_version_ | 1811682769826217984 |
---|---|
author | He, Qi |
author2 | Lim Ee Peng |
author_facet | Lim Ee Peng He, Qi |
author_sort | He, Qi |
collection | NTU |
description | Topic detection (TD) is an important area of research whose primary goal is to detect retrospective or new topics from a stream of news articles. It could be extremely useful in many applications including news aggregation portals, news alert systems, event search engine, terrorist activity tracking, etc. However, specialists who analyze news articles have a hard time separating the wheat from the chaff, due to the overwhelming amount of news streams (over 10,000 as of 2008). For many years, Topic Detection has been tackled as a clustering task by the TDT (Topic Detection and Tracking) research community. However, time, which plays a pivotal role in news articles has never been given due consideration in the past. In this research we present a thorough study on various temporal topic detection models that explicitly incorporate the element of time. We further discovered that bursty temporal word features play an important role in improving topic detection performance, and ventured to provide an in-depth analysis and systematic categorization of all word features into 5 general types using techniques from signal processing. Armed with a small set of extracted bursty features from historical or online news streams, we proposed a number of effective algorithms to detect topics from a news stream in both offline and online modes. Our algorithms are mathematically elegant, simple, and extremely practical, when benchmarked against some of the best topic detection models including spherical k-means, Latent Dirichlet Allocation (LDA), and von-Mises Fisher mixtures. Finally, we present a case study of a personalized news alert application, where subscribers can specify interesting anticipatory events, and show how a simple supervised event transition classifier can be used to effectively identify user anticipated events. Our research is one of the most comprehensive studies on both offline and online topic detection, of which the latter has been an open research problem for many years. In fact, our online topic detection model can be viewed as a significant advancement in the field, which paves the way for further improvements by other TDT experts. |
first_indexed | 2024-10-01T04:02:07Z |
format | Thesis |
id | ntu-10356/17764 |
institution | Nanyang Technological University |
language | English |
last_indexed | 2024-10-01T04:02:07Z |
publishDate | 2009 |
record_format | dspace |
spelling | ntu-10356/177642023-03-04T00:42:40Z Topical analysis of text streams He, Qi Lim Ee Peng Chang Kuiyu School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing Topic detection (TD) is an important area of research whose primary goal is to detect retrospective or new topics from a stream of news articles. It could be extremely useful in many applications including news aggregation portals, news alert systems, event search engine, terrorist activity tracking, etc. However, specialists who analyze news articles have a hard time separating the wheat from the chaff, due to the overwhelming amount of news streams (over 10,000 as of 2008). For many years, Topic Detection has been tackled as a clustering task by the TDT (Topic Detection and Tracking) research community. However, time, which plays a pivotal role in news articles has never been given due consideration in the past. In this research we present a thorough study on various temporal topic detection models that explicitly incorporate the element of time. We further discovered that bursty temporal word features play an important role in improving topic detection performance, and ventured to provide an in-depth analysis and systematic categorization of all word features into 5 general types using techniques from signal processing. Armed with a small set of extracted bursty features from historical or online news streams, we proposed a number of effective algorithms to detect topics from a news stream in both offline and online modes. Our algorithms are mathematically elegant, simple, and extremely practical, when benchmarked against some of the best topic detection models including spherical k-means, Latent Dirichlet Allocation (LDA), and von-Mises Fisher mixtures. Finally, we present a case study of a personalized news alert application, where subscribers can specify interesting anticipatory events, and show how a simple supervised event transition classifier can be used to effectively identify user anticipated events. Our research is one of the most comprehensive studies on both offline and online topic detection, of which the latter has been an open research problem for many years. In fact, our online topic detection model can be viewed as a significant advancement in the field, which paves the way for further improvements by other TDT experts. DOCTOR OF PHILOSOPHY (SCE) 2009-06-15T01:27:56Z 2009-06-15T01:27:56Z 2009 2009 Thesis He, Q. (2009). Topical analysis of text streams. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/17764 10.32657/10356/17764 en 200 p. application/pdf |
spellingShingle | DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing He, Qi Topical analysis of text streams |
title | Topical analysis of text streams |
title_full | Topical analysis of text streams |
title_fullStr | Topical analysis of text streams |
title_full_unstemmed | Topical analysis of text streams |
title_short | Topical analysis of text streams |
title_sort | topical analysis of text streams |
topic | DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing |
url | https://hdl.handle.net/10356/17764 |
work_keys_str_mv | AT heqi topicalanalysisoftextstreams |