Visual data analysis supported by eye-tracking, multi-touch displays, and machine learning

In recent years, data analysts have been confronted by increasing amounts of data, often in the form of multivariate datasets. Multivariate datasets can be thought of as a table, where dimensions are columns, and records are rows. Machine learning and data mining algorithms can help an analyst to bu...

Full description

Bibliographic Details
Main Author: Mohammad Chegini
Other Authors: Alexei Sourin
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/146744
Description
Summary:In recent years, data analysts have been confronted by increasing amounts of data, often in the form of multivariate datasets. Multivariate datasets can be thought of as a table, where dimensions are columns, and records are rows. Machine learning and data mining algorithms can help an analyst to build machine learning (ML) models to find structures in a dataset algorithmically. Alternatively, visualisation techniques such as scatterplot, scatterplot matrix, and parallel coordinates can help an analyst explore and find structures in a dataset visually. Although extensive research has been done around building and visualising an ML model, there is less research linking ML models and visualisations through human-centred interactions. Such a connection has the potential to help an analyst build better ML models by interactively steering the process. However, designing and evaluating such interaction techniques is challenging. In this thesis, visual analytics techniques are proposed, which focus on building and modifying an ML model of a multivariate dataset, using machine learning, visualisation, and interactions. Moreover, the use of novel interaction modalities and devices such as large multi-touch displays, handheld devices, and eye-trackers is explored. As a first step, a novel approach for selecting, searching for, and comparing local patterns within multivariate datasets using scatterplots is presented. An analyst can select a part of a scatterplot from a scatterplot matrix, and search for similar patterns using both model-based (ML regression) descriptors and shape-based descriptors. A relevance feedback module enables the analyst to improve the regression analysis and find relevant patterns more effectively. The second part of the thesis goes beyond simple interaction and exploration using an ML model and focuses on ML model creation and modification. Specifically, an interactive visual labelling technique is presented, which allows an analyst to build and interactively improve an (ML classification) model for multivariate datasets. The technique combines linked visualisations, clustering, and active learning to help an analyst interactively label a multivariate dataset. In the third step, a user study was conducted which showed that such an interactive labelling technique could surpass common active learning algorithms for building an effective ML model. Finally, the fourth part of the thesis explores several novel interaction modalities. It is shown how large multi-touch displays are e ective for collaborative analysis of scatterplots. Extending these interactions, analysts can use a secondary handheld device to interact with linked-view information visualisation application to label multivariate datasets. In addition, user eye gaze interaction can be garnered by the system to help re-arrange the axes in a parallel coordinates visualisation. In summary, this thesis uses human-centred interactions to bridge the gap between ML techniques and visualisation techniques. The thesis presents how to (1) interactively search and explore local regression models in a scatterplot space, (2) interactively build and improve an ML model of a multivariate dataset by linked visualisations, clustering, and active learning, and (3) use eye-tracking and multi-touch displays to investigate regression ML models collaboratively, and use eye gaze as an input for interaction with visualisations of a multivariate dataset.