Multimedia indexing and retrieval with lucene and concept annotations processing

The rise of interest in automated speech recognition technology has paved ways for various new applications in recent years. Speeches from seminars, conferences and lectures are now being able to be translated into text format automatically. Once the speech data is retrieved from the multimedia lect...

Full description

Bibliographic Details
Main Author: Kyaw, Zin Tun
Other Authors: Chng Eng Siong
Format: Final Year Project (FYP)
Language:English
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/10356/64750
Description
Summary:The rise of interest in automated speech recognition technology has paved ways for various new applications in recent years. Speeches from seminars, conferences and lectures are now being able to be translated into text format automatically. Once the speech data is retrieved from the multimedia lectures files in the form of transcription, the data has to be further processed to display in human friendly format. This calls for the need of development of content based search on lecture video and streaming web interface to enhance the experience higher education and research study. The goal of this project is to develop a client-server web interface (LECTS SEARCH) that can facilitate viewing and searching of keywords or concepts within the content of lecture videos simultaneously. A single collection of raw speech data can potentially contain up to millions of words and, the storing and retrieving of the relevant data can be challenging. Hence, the efficient indexing mechanism to maintain the data is required. This thesis will focus on archiving and retrieval of the speech data by performing Inverted Indexing on keywords so that the data can be readily available for further uses such as keyword searching. This thesis also covers the storage and retrieval of concept-keywords using tree representation data structure, Extended Markup Language. Inverted Indexing is one of the widely used multimedia indexing techniques where it looks for unique terms within the sentences of the documents. Each unique term can be used to effectively determine the document correspond to it and, in this way, the speed of information retrieval has been greatly improved. Currently, the speech data to be indexed are primarily from the MIT lectures on Aerospace (27MB/ ~330,616 words) and Signal Processing (12MB / ~150,280 words) Domain. From the experiments, the time taken to search keywords for each domain ranges from 0.4 to 0.9 seconds. However, there is an issue with the increased retrieval time for documents when keywords search on multiple collections are made.