Discovery of interesting phrases from text streams

The fast adoption of blogs and tweets in recent years has been generating a large and diversified amount of information feeds daily. In order to take advantage of this vast knowledge, there is a need to automatically and efficiently organize these timely data into useful information. These text stre...

Full description

Bibliographic Details
Main Author: Pang, Jeffrey Jian Hao
Other Authors: Sun Aixin
Format: Final Year Project (FYP)
Language:English
Published: 2011
Subjects:
Online Access:http://hdl.handle.net/10356/46465
_version_ 1826116703130484736
author Pang, Jeffrey Jian Hao
author2 Sun Aixin
author_facet Sun Aixin
Pang, Jeffrey Jian Hao
author_sort Pang, Jeffrey Jian Hao
collection NTU
description The fast adoption of blogs and tweets in recent years has been generating a large and diversified amount of information feeds daily. In order to take advantage of this vast knowledge, there is a need to automatically and efficiently organize these timely data into useful information. These text streams usually contain interesting phrases that provide summarized insights of the content of the text. In this project, we are interested in extracting interesting phrases, consolidating them and transforming them into meaningful statistics such as the amount of media coverage of a certain event during a specific time period, by making use of their temporal information such as “date published”. This report explores the various methodologies and algorithms used in keyphrase extraction. It also documents the development and implementation of a search engine titled “Interesting Phrases Analysis Program (IPAP)” designed for this project. IPAP is capable of retrieving interesting phrases from large collection of blog entries. It indexes and allows users to perform a series of different useful analysis on the search result. The trend of phrases, relationship between phrases, niche of each blog and other handy information can be obtained from the analysis. It can also be developed to use with tweets. The applications and future development potential are also discussed in this report. IPAP proves that the analysis of interesting phrases from text stream such as blog can generate unexpectedly large amount of beneficial information.
first_indexed 2024-10-01T04:16:01Z
format Final Year Project (FYP)
id ntu-10356/46465
institution Nanyang Technological University
language English
last_indexed 2024-10-01T04:16:01Z
publishDate 2011
record_format dspace
spelling ntu-10356/464652023-03-03T20:29:13Z Discovery of interesting phrases from text streams Pang, Jeffrey Jian Hao Sun Aixin School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing The fast adoption of blogs and tweets in recent years has been generating a large and diversified amount of information feeds daily. In order to take advantage of this vast knowledge, there is a need to automatically and efficiently organize these timely data into useful information. These text streams usually contain interesting phrases that provide summarized insights of the content of the text. In this project, we are interested in extracting interesting phrases, consolidating them and transforming them into meaningful statistics such as the amount of media coverage of a certain event during a specific time period, by making use of their temporal information such as “date published”. This report explores the various methodologies and algorithms used in keyphrase extraction. It also documents the development and implementation of a search engine titled “Interesting Phrases Analysis Program (IPAP)” designed for this project. IPAP is capable of retrieving interesting phrases from large collection of blog entries. It indexes and allows users to perform a series of different useful analysis on the search result. The trend of phrases, relationship between phrases, niche of each blog and other handy information can be obtained from the analysis. It can also be developed to use with tweets. The applications and future development potential are also discussed in this report. IPAP proves that the analysis of interesting phrases from text stream such as blog can generate unexpectedly large amount of beneficial information. Bachelor of Engineering (Computer Science) 2011-12-06T04:55:23Z 2011-12-06T04:55:23Z 2011 2011 Final Year Project (FYP) http://hdl.handle.net/10356/46465 en Nanyang Technological University 63 p. application/pdf
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Pang, Jeffrey Jian Hao
Discovery of interesting phrases from text streams
title Discovery of interesting phrases from text streams
title_full Discovery of interesting phrases from text streams
title_fullStr Discovery of interesting phrases from text streams
title_full_unstemmed Discovery of interesting phrases from text streams
title_short Discovery of interesting phrases from text streams
title_sort discovery of interesting phrases from text streams
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
url http://hdl.handle.net/10356/46465
work_keys_str_mv AT pangjeffreyjianhao discoveryofinterestingphrasesfromtextstreams