Visualization assisted enterprise search engine

<p>In most organizations, the number of files increases at a rate similar to the growth of data. As one of the big data challenges, many enterprises encounter a common difficulty in a routine operation, that is, finding files in a large-scale file system typically distributed across several ph...

Full description

Bibliographic Details
Main Author: Khan, S
Other Authors: Chen, M
Format: Thesis
Published: 2015
Description
Summary:<p>In most organizations, the number of files increases at a rate similar to the growth of data. As one of the big data challenges, many enterprises encounter a common difficulty in a routine operation, that is, finding files in a large-scale file system typically distributed across several physical sites and accessed by thousands of users. This thesis addresses a central question: whether or not visualization techniques can be used to improve the effectiveness and efficiency in performing numerous file searching operations at an industrial scale. All work conducted in this research was done in partnership with Laing O’Rourke as an industrial collaborator.</p> <p>The main technical approaches to support file searching operations include (a) the use of a database to manage searchable records of files and (b) the use of a search engine to add the exploration of a less-structured file repository. With the rapid increase of files, the former approach incurs a huge cost on entering records of files into the database, while the latter suffers from unreliable search results (false positives and false negatives) and difficulties in collaborative search. This thesis focuses on the second approach, that is, to develop a visualization-assisted enterprise search engine.</p> <p>In this thesis, we propose two novel visualization techniques in conjunction with an experimental enterprise search engine. The first technique provides users with focus+context visualization of search results (focus) in relation to the search space (context). This assists users in identifying false positives rapidly, and helps users hypothesize potential false negatives and investigate them through the refinement of search criteria. A number of methods for depicting the multivariate information associated to search results were designed, implemented and compared. Empirical studies were conducted to discover the visual attributes for glyph-based and animation-based methods, and to evaluate different visual designs.</p> <p>The second technique provides users with support for search activities over a period of time and in collaboration. We developed the novel concept of Search Provenance Graph (SPG), and a method for connecting semantically similar queries in SPGs. Methods and software for visualizing SPGs were designed and implemented, enabling users in collaboration to acquire provenance information efficiently and formulate/reformulate queries effectively.</p> <p>In conjunction with the research on visualization techniques, we developed an experimental enterprise search engine, which allows visualization components to be integrated. The search engine is knowledge-based, and is supported by multiple ontologies and crawler agents for exploring the search space. We used query-expansion and results ranking to reduce false positives and negatives, active-learning to enable dynamic learning during search operations, and history-based indexing to facilitate real-time return of search results.</p> <p>This research is the first step towards the development of visualization-assisted enterprise search engines as a new technology that can address a major big data challenge in industry, and can bring a significant amount of cost-effectiveness to everyday operations. </p>