Probabilistic Models for Competence Assessment in Education

Probabilistic models of competence assessment join the benefits of automation with human judgment. We start this paper by replicating two preexisting probabilistic models of peer assessment (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"...

Full description

Bibliographic Details
Main Authors: Alejandra López de Aberasturi Gómez, Jordi Sabater-Mir, Carles Sierra
Format: Article
Language:English
Published: MDPI AG 2022-02-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/5/2368
Description
Summary:Probabilistic models of competence assessment join the benefits of automation with human judgment. We start this paper by replicating two preexisting probabilistic models of peer assessment (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>P</mi><msub><mi>G</mi><mn>1</mn></msub></mrow></semantics></math></inline-formula>-bias and PAAS). Despite the use that both make of probability theory, the approach of these models is radically different. While <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>P</mi><msub><mi>G</mi><mn>1</mn></msub></mrow></semantics></math></inline-formula>-bias is purely Bayesian, PAAS models the evaluation process in a classroom as a multiagent system, where each actor relies on the judgment of others as long as their opinions coincide. To reconcile the benefits of Bayesian inference with the concept of trust posed in PAAS, we propose a third peer evaluation model that considers the correlations between any pair of peers who have evaluated someone in common: <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>P</mi><mi>G</mi></mrow></semantics></math></inline-formula>-bivariate. The rest of the paper is devoted to a comparison with synthetic data from these three models. We show that <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>P</mi><msub><mi>G</mi><mn>1</mn></msub></mrow></semantics></math></inline-formula>-bias produces predictions with lower root mean squared error (RMSE) than <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>P</mi><mi>G</mi></mrow></semantics></math></inline-formula>-bivariate. However, both models display similar behaviors when assessing how to choose the next assignment to be graded by a peer, with an “<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>R</mi><mi>M</mi><mi>S</mi><mi>E</mi></mrow></semantics></math></inline-formula> decreasing policy” reporting better results than a random policy. Fair comparisons among the three models show that <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>P</mi><msub><mi>G</mi><mn>1</mn></msub></mrow></semantics></math></inline-formula>-bias makes the lowest error in situations of scarce ground truths. Nevertheless, once nearly <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>20</mn><mo>%</mo></mrow></semantics></math></inline-formula> of the teacher’s assessments are introduced, PAAS sometimes exceeds the quality of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>P</mi><msub><mi>G</mi><mn>1</mn></msub></mrow></semantics></math></inline-formula>-bias’ predictions by following an entropy minimization heuristic. <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>P</mi><mi>G</mi></mrow></semantics></math></inline-formula>-bivariate, our new proposal to reconcile PAAS’ trust-based approach with <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>P</mi><msub><mi>G</mi><mn>1</mn></msub></mrow></semantics></math></inline-formula>-bias’ theoretical background, obtains a similar percentage of error values to those of the original models. Future work includes applying the models to real experimental data and exploring new heuristics to determine which teacher’s grade should be obtained next to minimize the overall error.
ISSN:2076-3417