Is GPT-4 a reliable rater? Evaluating consistency in GPT-4's text ratings
This study reports the Intraclass Correlation Coefficients of feedback ratings produced by OpenAI's GPT-4, a large language model (LLM), across various iterations, time frames, and stylistic variations. The model was used to rate responses to tasks related to macroeconomics in higher education...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2023-12-01
|
Series: | Frontiers in Education |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/feduc.2023.1272229/full |