Collusion-resistant spatial phenomena crowdsourcing

Data trustworthiness is a crucial issue in real world crowdsourcing and participatory sensing applications. Without considering this issue, different types of worker misbehavior, especially the challenging collusion attacks, can result in biased and inaccurate estimation and decision making. Previou...

Full description

Bibliographic Details
Main Author: Xiang, Qikun
Other Authors: Zhang Jie
Format: Final Year Project (FYP)
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10356/70228
Description
Summary:Data trustworthiness is a crucial issue in real world crowdsourcing and participatory sensing applications. Without considering this issue, different types of worker misbehavior, especially the challenging collusion attacks, can result in biased and inaccurate estimation and decision making. Previous works mostly focus on object labelling crowdsourcing, rating-based opinion crowdsourcing, and estimation of continuousvalued quantities, while little attention has been paid to a more challenging type of tasks in participatory sensing, the spatial field regression. In this project, we constructed a novel trust-based mixture of Gaussian processes (GP) model for spatial field regression to jointly detect worker misbehaviors and accurately reconstruct the spatial field. It is able to model stationary and non-stationary spatial fields, while incorporating complex malicious attacks. We developed a Markov chain Monte Carlo (MCMC)-based inference algorithm to efficiently perform Bayesian inference of the proposed model. The inference algorithm was implemented using MATLAB. To evaluate the predictive accuracy of the proposed model, we performed experiments using two real world datasets of spatial phenomena, and compared the model with three baseline models. The experimental results show that the proposed model is able to achieve better predictive accuracies when untrustworthy data is present. The experiments also highlighted the high computational cost and memory usage associated with GP regression, especially with non-stationary GP regression. Hence, future work will focus on optimizing the memory usage and adopting reduced-rank approximation methods to the model.