Polling India via regression and post-stratification of non-probability online samples.

Recent technological advances have facilitated the collection of large-scale administrative data and the online surveying of the Indian population. Building on these we propose a strategy for more robust, frequent and transparent projections of the Indian vote during the campaign. We execute a modif...

Full description

Bibliographic Details
Main Authors: Roberto Cerina, Raymond Duch
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2021-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0260092
_version_ 1798024971076763648
author Roberto Cerina
Raymond Duch
author_facet Roberto Cerina
Raymond Duch
author_sort Roberto Cerina
collection DOAJ
description Recent technological advances have facilitated the collection of large-scale administrative data and the online surveying of the Indian population. Building on these we propose a strategy for more robust, frequent and transparent projections of the Indian vote during the campaign. We execute a modified MrP model of Indian vote preferences that proposes innovations to each of its three core components: stratification frame, training data, and a learner. For the post-stratification frame we propose a novel Data Integration approach that allows the simultaneous estimation of counts from multiple complementary sources, such as census tables and auxiliary surveys. For the training data we assemble panels of respondents from two unorthodox online populations: Amazon Mechanical Turks workers and Facebook users. And as a modeling tool, we replace the Bayesian multilevel regression learner with Random Forests. Our 2019 pre-election forecasts for the two largest Lok Sahba coalitions were very close to actual outcomes: we predicted 41.8% for the NDA, against an observed value of 45.0% and 30.8% for the UPA against an observed vote share of just under 31.3%. Our uniform-swing seat projection outperforms other pollsters-we had the lowest absolute error of 89 seats (along with a poll from 'Jan Ki Baat'); the lowest error on the NDA-UPA lead (a mere 8 seats), and we are the only pollster that can capture real-time preference shifts due to salient campaign events.
first_indexed 2024-04-11T18:11:15Z
format Article
id doaj.art-d317267674784031be75b7e5aaaad89c
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-11T18:11:15Z
publishDate 2021-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-d317267674784031be75b7e5aaaad89c2022-12-22T04:10:06ZengPublic Library of Science (PLoS)PLoS ONE1932-62032021-01-011611e026009210.1371/journal.pone.0260092Polling India via regression and post-stratification of non-probability online samples.Roberto CerinaRaymond DuchRecent technological advances have facilitated the collection of large-scale administrative data and the online surveying of the Indian population. Building on these we propose a strategy for more robust, frequent and transparent projections of the Indian vote during the campaign. We execute a modified MrP model of Indian vote preferences that proposes innovations to each of its three core components: stratification frame, training data, and a learner. For the post-stratification frame we propose a novel Data Integration approach that allows the simultaneous estimation of counts from multiple complementary sources, such as census tables and auxiliary surveys. For the training data we assemble panels of respondents from two unorthodox online populations: Amazon Mechanical Turks workers and Facebook users. And as a modeling tool, we replace the Bayesian multilevel regression learner with Random Forests. Our 2019 pre-election forecasts for the two largest Lok Sahba coalitions were very close to actual outcomes: we predicted 41.8% for the NDA, against an observed value of 45.0% and 30.8% for the UPA against an observed vote share of just under 31.3%. Our uniform-swing seat projection outperforms other pollsters-we had the lowest absolute error of 89 seats (along with a poll from 'Jan Ki Baat'); the lowest error on the NDA-UPA lead (a mere 8 seats), and we are the only pollster that can capture real-time preference shifts due to salient campaign events.https://doi.org/10.1371/journal.pone.0260092
spellingShingle Roberto Cerina
Raymond Duch
Polling India via regression and post-stratification of non-probability online samples.
PLoS ONE
title Polling India via regression and post-stratification of non-probability online samples.
title_full Polling India via regression and post-stratification of non-probability online samples.
title_fullStr Polling India via regression and post-stratification of non-probability online samples.
title_full_unstemmed Polling India via regression and post-stratification of non-probability online samples.
title_short Polling India via regression and post-stratification of non-probability online samples.
title_sort polling india via regression and post stratification of non probability online samples
url https://doi.org/10.1371/journal.pone.0260092
work_keys_str_mv AT robertocerina pollingindiaviaregressionandpoststratificationofnonprobabilityonlinesamples
AT raymondduch pollingindiaviaregressionandpoststratificationofnonprobabilityonlinesamples