Is GPT-4 a reliable rater? Evaluating consistency in GPT-4's text ratings

This study reports the Intraclass Correlation Coefficients of feedback ratings produced by OpenAI's GPT-4, a large language model (LLM), across various iterations, time frames, and stylistic variations. The model was used to rate responses to tasks related to macroeconomics in higher education...

Full description

Bibliographic Details
Main Authors:	Veronika Hackl, Alexandra Elena Müller, Michael Granitzer, Maximilian Sailer
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2023-12-01
Series:	Frontiers in Education
Subjects:	artificial intelligence GPT-4 large language model prompt engineering feedback higher education
Online Access:	https://www.frontiersin.org/articles/10.3389/feduc.2023.1272229/full

Internet

https://www.frontiersin.org/articles/10.3389/feduc.2023.1272229/full

Is GPT-4 a reliable rater? Evaluating consistency in GPT-4's text ratings

Internet

Similar Items