Is GPT-4 a reliable rater? Evaluating consistency in GPT-4's text ratings

This study reports the Intraclass Correlation Coefficients of feedback ratings produced by OpenAI's GPT-4, a large language model (LLM), across various iterations, time frames, and stylistic variations. The model was used to rate responses to tasks related to macroeconomics in higher education...

Full description

Bibliographic Details
Main Authors: Veronika Hackl, Alexandra Elena Müller, Michael Granitzer, Maximilian Sailer
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-12-01
Series:Frontiers in Education
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/feduc.2023.1272229/full