Framework-based qualitative analysis of free responses of Large Language Models: Algorithmic fidelity.

Today, with the advent of Large-scale generative Language Models (LLMs) it is now possible to simulate free responses to interview questions such as those traditionally analyzed using qualitative research methods. Qualitative methodology encompasses a broad family of techniques involving manual anal...

Full description

Bibliographic Details
Main Authors:	Aliya Amirova, Theodora Fteropoulli, Nafiso Ahmed, Martin R Cowie, Joel Z Leibo
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2024-01-01
Series:	PLoS ONE
Online Access:	https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0300024&type=printable

_version_	1797194865436524544
author	Aliya Amirova Theodora Fteropoulli Nafiso Ahmed Martin R Cowie Joel Z Leibo
author_facet	Aliya Amirova Theodora Fteropoulli Nafiso Ahmed Martin R Cowie Joel Z Leibo
author_sort	Aliya Amirova
collection	DOAJ
description	Today, with the advent of Large-scale generative Language Models (LLMs) it is now possible to simulate free responses to interview questions such as those traditionally analyzed using qualitative research methods. Qualitative methodology encompasses a broad family of techniques involving manual analysis of open-ended interviews or conversations conducted freely in natural language. Here we consider whether artificial "silicon participants" generated by LLMs may be productively studied using qualitative analysis methods in such a way as to generate insights that could generalize to real human populations. The key concept in our analysis is algorithmic fidelity, a validity concept capturing the degree to which LLM-generated outputs mirror human sub-populations' beliefs and attitudes. By definition, high algorithmic fidelity suggests that latent beliefs elicited from LLMs may generalize to real humans, whereas low algorithmic fidelity renders such research invalid. Here we used an LLM to generate interviews with "silicon participants" matching specific demographic characteristics one-for-one with a set of human participants. Using framework-based qualitative analysis, we showed the key themes obtained from both human and silicon participants were strikingly similar. However, when we analyzed the structure and tone of the interviews we found even more striking differences. We also found evidence of a hyper-accuracy distortion. We conclude that the LLM we tested (GPT-3.5) does not have sufficient algorithmic fidelity to expect in silico research on it to generalize to real human populations. However, rapid advances in artificial intelligence raise the possibility that algorithmic fidelity may improve in the future. Thus we stress the need to establish epistemic norms now around how to assess the validity of LLM-based qualitative research, especially concerning the need to ensure the representation of heterogeneous lived experiences.
first_indexed	2024-04-24T06:03:04Z
format	Article
id	doaj.art-607401a44b77487f9500b83e7e806de4
institution	Directory Open Access Journal
issn	1932-6203
language	English
last_indexed	2024-04-24T06:03:04Z
publishDate	2024-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj.art-607401a44b77487f9500b83e7e806de42024-04-23T05:32:03ZengPublic Library of Science (PLoS)PLoS ONE1932-62032024-01-01193e030002410.1371/journal.pone.0300024Framework-based qualitative analysis of free responses of Large Language Models: Algorithmic fidelity.Aliya AmirovaTheodora FteropoulliNafiso AhmedMartin R CowieJoel Z LeiboToday, with the advent of Large-scale generative Language Models (LLMs) it is now possible to simulate free responses to interview questions such as those traditionally analyzed using qualitative research methods. Qualitative methodology encompasses a broad family of techniques involving manual analysis of open-ended interviews or conversations conducted freely in natural language. Here we consider whether artificial "silicon participants" generated by LLMs may be productively studied using qualitative analysis methods in such a way as to generate insights that could generalize to real human populations. The key concept in our analysis is algorithmic fidelity, a validity concept capturing the degree to which LLM-generated outputs mirror human sub-populations' beliefs and attitudes. By definition, high algorithmic fidelity suggests that latent beliefs elicited from LLMs may generalize to real humans, whereas low algorithmic fidelity renders such research invalid. Here we used an LLM to generate interviews with "silicon participants" matching specific demographic characteristics one-for-one with a set of human participants. Using framework-based qualitative analysis, we showed the key themes obtained from both human and silicon participants were strikingly similar. However, when we analyzed the structure and tone of the interviews we found even more striking differences. We also found evidence of a hyper-accuracy distortion. We conclude that the LLM we tested (GPT-3.5) does not have sufficient algorithmic fidelity to expect in silico research on it to generalize to real human populations. However, rapid advances in artificial intelligence raise the possibility that algorithmic fidelity may improve in the future. Thus we stress the need to establish epistemic norms now around how to assess the validity of LLM-based qualitative research, especially concerning the need to ensure the representation of heterogeneous lived experiences.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0300024&type=printable
spellingShingle	Aliya Amirova Theodora Fteropoulli Nafiso Ahmed Martin R Cowie Joel Z Leibo Framework-based qualitative analysis of free responses of Large Language Models: Algorithmic fidelity. PLoS ONE
title	Framework-based qualitative analysis of free responses of Large Language Models: Algorithmic fidelity.
title_full	Framework-based qualitative analysis of free responses of Large Language Models: Algorithmic fidelity.
title_fullStr	Framework-based qualitative analysis of free responses of Large Language Models: Algorithmic fidelity.
title_full_unstemmed	Framework-based qualitative analysis of free responses of Large Language Models: Algorithmic fidelity.
title_short	Framework-based qualitative analysis of free responses of Large Language Models: Algorithmic fidelity.
title_sort	framework based qualitative analysis of free responses of large language models algorithmic fidelity
url	https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0300024&type=printable
work_keys_str_mv	AT aliyaamirova frameworkbasedqualitativeanalysisoffreeresponsesoflargelanguagemodelsalgorithmicfidelity AT theodorafteropoulli frameworkbasedqualitativeanalysisoffreeresponsesoflargelanguagemodelsalgorithmicfidelity AT nafisoahmed frameworkbasedqualitativeanalysisoffreeresponsesoflargelanguagemodelsalgorithmicfidelity AT martinrcowie frameworkbasedqualitativeanalysisoffreeresponsesoflargelanguagemodelsalgorithmicfidelity AT joelzleibo frameworkbasedqualitativeanalysisoffreeresponsesoflargelanguagemodelsalgorithmicfidelity

Framework-based qualitative analysis of free responses of Large Language Models: Algorithmic fidelity.

Similar Items