Measuring an artificial intelligence language model’s trust in humans using machine incentives

Will advanced artificial intelligence (AI) language models exhibit trust toward humans? Gauging an AI model’s trust in humans is challenging because—absent costs for dishonesty—models might respond falsely about trusting humans. Accordingly, we devise a method for incentivizing machine decisions wit...

Full description

Bibliographic Details
Main Authors: Tim Johnson, Nick Obradovich
Format: Article
Language:English
Published: IOP Publishing 2024-01-01
Series:Journal of Physics: Complexity
Subjects:
Online Access:https://doi.org/10.1088/2632-072X/ad1c69