Reliability of crowdsourcing as a method for collecting emotions labels on pictures
Abstract Objective In this paper we study if and under what conditions crowdsourcing can be used as a reliable method for collecting high-quality emotion labels on pictures. To this end, we run a set of crowdsourcing experiments on the widely used IAPS dataset, using the Self-Assessment Manikin (SAM...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-10-01
|
Series: | BMC Research Notes |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s13104-019-4764-4 |
_version_ | 1829483554571026432 |
---|---|
author | Olga Korovina Marcos Baez Fabio Casati |
author_facet | Olga Korovina Marcos Baez Fabio Casati |
author_sort | Olga Korovina |
collection | DOAJ |
description | Abstract Objective In this paper we study if and under what conditions crowdsourcing can be used as a reliable method for collecting high-quality emotion labels on pictures. To this end, we run a set of crowdsourcing experiments on the widely used IAPS dataset, using the Self-Assessment Manikin (SAM) emotion collection instrument, in order to rate pictures on valence, arousal and dominance, and explore the consistency of crowdsourced results across multiple runs (reliability) and the level of agreement with the gold labels (quality). In doing so, we explored the impact of targeting populations of different level of reputation (and cost) and collecting varying numbers of ratings per picture. Results The results tell us that crowdsourcing can be a reliable method, reaching excellent levels of reliability and agreement with only 3 ratings per picture for valence and 8 per arousal, with only marginal difference between target populations. Results for dominance were very poor, echoing previous studies on the data collection instrument used. We also observed that specific types of content generate diverging opinions in participants (leading to higher variability or multimodal distributions), which remain consistent across pictures of the same theme. These can inform the data collection and exploitation of crowdsourced emotion datasets. |
first_indexed | 2024-12-14T22:09:52Z |
format | Article |
id | doaj.art-548cf6aabb4c41418f59480b003c04f3 |
institution | Directory Open Access Journal |
issn | 1756-0500 |
language | English |
last_indexed | 2024-12-14T22:09:52Z |
publishDate | 2019-10-01 |
publisher | BMC |
record_format | Article |
series | BMC Research Notes |
spelling | doaj.art-548cf6aabb4c41418f59480b003c04f32022-12-21T22:45:46ZengBMCBMC Research Notes1756-05002019-10-011211610.1186/s13104-019-4764-4Reliability of crowdsourcing as a method for collecting emotions labels on picturesOlga Korovina0Marcos Baez1Fabio Casati2University of TrentoUniversity of TrentoUniversity of TrentoAbstract Objective In this paper we study if and under what conditions crowdsourcing can be used as a reliable method for collecting high-quality emotion labels on pictures. To this end, we run a set of crowdsourcing experiments on the widely used IAPS dataset, using the Self-Assessment Manikin (SAM) emotion collection instrument, in order to rate pictures on valence, arousal and dominance, and explore the consistency of crowdsourced results across multiple runs (reliability) and the level of agreement with the gold labels (quality). In doing so, we explored the impact of targeting populations of different level of reputation (and cost) and collecting varying numbers of ratings per picture. Results The results tell us that crowdsourcing can be a reliable method, reaching excellent levels of reliability and agreement with only 3 ratings per picture for valence and 8 per arousal, with only marginal difference between target populations. Results for dominance were very poor, echoing previous studies on the data collection instrument used. We also observed that specific types of content generate diverging opinions in participants (leading to higher variability or multimodal distributions), which remain consistent across pictures of the same theme. These can inform the data collection and exploitation of crowdsourced emotion datasets.http://link.springer.com/article/10.1186/s13104-019-4764-4Crowdsourcing emotionsEmpirical studyRating behaviorReliability |
spellingShingle | Olga Korovina Marcos Baez Fabio Casati Reliability of crowdsourcing as a method for collecting emotions labels on pictures BMC Research Notes Crowdsourcing emotions Empirical study Rating behavior Reliability |
title | Reliability of crowdsourcing as a method for collecting emotions labels on pictures |
title_full | Reliability of crowdsourcing as a method for collecting emotions labels on pictures |
title_fullStr | Reliability of crowdsourcing as a method for collecting emotions labels on pictures |
title_full_unstemmed | Reliability of crowdsourcing as a method for collecting emotions labels on pictures |
title_short | Reliability of crowdsourcing as a method for collecting emotions labels on pictures |
title_sort | reliability of crowdsourcing as a method for collecting emotions labels on pictures |
topic | Crowdsourcing emotions Empirical study Rating behavior Reliability |
url | http://link.springer.com/article/10.1186/s13104-019-4764-4 |
work_keys_str_mv | AT olgakorovina reliabilityofcrowdsourcingasamethodforcollectingemotionslabelsonpictures AT marcosbaez reliabilityofcrowdsourcingasamethodforcollectingemotionslabelsonpictures AT fabiocasati reliabilityofcrowdsourcingasamethodforcollectingemotionslabelsonpictures |