Sequential crowdsourcing and recommendation strategies

Crowdsourcing has emerged as an online, distributed problem-solving and production model where many participating workers are involved in performing tasks collaboratively in domains such as gathering language-related data, labeling the content of an image, or voting. In modern online platforms, incl...

Full description

Bibliographic Details
Main Author: Kang, Qiyu
Other Authors: Tay Wee Peng
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/137199
Description
Summary:Crowdsourcing has emerged as an online, distributed problem-solving and production model where many participating workers are involved in performing tasks collaboratively in domains such as gathering language-related data, labeling the content of an image, or voting. In modern online platforms, including crowdsourcing, recommendation systems play an essential role and have become indispensable. To provide better service in crowdsourcing and recommendation, in this thesis, we study sequential crowdsourcing and recommendation strategies where historical data will be explored when making decisions at the current time step. Firstly, we consider a multi-class labeling problem in crowdsourcing. As workers may be unreliable, we propose to perform sequential questioning in which the questions posed to the workers are designed based on previous questions and answers. We propose a partially-observable Markov decision process (POMDP) framework to determine the best questioning strategy, subject to the crowdsourcer's budget constraint. As this POMDP formulation is in general intractable, we develop a suboptimal approach based on a $q$-ary Ulam-R\'enyi game. We also propose a sampling heuristic, which can be used in tandem with standard POMDP solvers, using our Ulam-R\'enyi strategy. We demonstrate through simulations that our approaches outperform a non-sequential strategy based on error correction coding and which does not utilize workers' previous responses. Secondly, we consider a sequential task recommendation problem in a crowdsourcing platform where assigning tasks more likely to be accepted by a worker who is more likely to complete it reliably results in better performance for the crowdsourcer. Without prior information about a worker, his preferences and reliabilities need to be learned over time. We propose a multi-armed bandit (MAB) framework to sequentially learn a worker's preferences and reliabilities for different categories of tasks. However, unlike the classical MAB problem, the reward from the worker's completion of a task is unobservable. We therefore include the use of gold tasks (i.e., tasks whose solutions are known \emph{a priori} and which do not produce any rewards) in our task recommendation procedure. Our model could be viewed as a new variant of MAB, in which the random rewards can only be observed at those time steps where gold tasks are used, and the accuracy of estimating the expected reward of recommending a task to a worker depends on the number of gold tasks used. We show that the optimal regret is $O(\sqrt{n})$, where $n$ is the number of time steps. We develop three task recommendation strategies to determine the number of gold tasks for different task categories, and show that they are order optimal. Simulations verify the efficiency of our approaches. Finally, we propose a linear stochastic bandit formulation to model sequential non-discriminatory recommendations. Recommendation systems targeting at finding the best matching between items and users without considering ``fairness'' have been reported as generating discriminatory recommendations from users’ perspectives. In principle, to avoid discrimination, protected attributes such as race or gender should not be used in recommendation algorithms. However, directly discarding the protected attributes will potentially leave lingering discrimination in the large bias error of underfitted modelling without enough parameters. We propose to exclude the individual’s biases and maximize the (expected) cumulative reward like users’ clicks or ratings over a subspace of item attributes, based on the reward observed for the full space. We call this orthogonal projection in linear bandits. We propose sequential non-discriminatory recommendation strategies and prove that they achieve sublinear cumulative projection regret.