AI alignment and human reward

AI alignment and human reward

According to a prominent approach to AI alignment, AI agents should be built to learn and promote human values. However, humans value things in several different ways: we have desires and preferences of various kinds, and if we engage in reinforcement learning, we also have reward functions. One res...

Full description

Bibliographic Details
Main Author:	Butlin, P
Format:	Conference item
Language:	English
Published:	Association for Computing Machinery 2021

Similar Items

AI assertion
by: Butlin, P, et al.
Published: (2024)

Transparent Value Alignment: Foundations for Human-Centered Explainable AI in Alignment
by: Sanneman, Lindsay
Published: (2023)

Building Blocks for Human-AI Alignment: Specify, Inspect, Model, and Revise
by: Booth, Serena Lynn
Published: (2024)

AI Ethics and Value Alignment for Nonhuman Animals
by: Soenke Ziesche
Published: (2021-04-01)

Analyzing the Alignment between AI Curriculum and AI Textbooks through Text Mining
by: Hyeji Yang, et al.
Published: (2023-09-01)

Reward is not reward: Differential impacts of primary and secondary rewards on expectation, outcome, and prediction error in the human brain's reward processing regions
by: Martin Ulrich, et al.
Published: (2023-12-01)

Shared Interest: Measuring Human-AI Alignment to Identify Recurring Patterns in Model Behavior
by: Boggust, Angie, et al.
Published: (2022)

Mega-reward: Achieving human-level play without extrinsic rewards
by: Song, Y, et al.
Published: (2020)

Segregated encoding of reward-identity and stimulus-reward associations in human orbitofrontal cortex.
by: Klein-Flügge, M, et al.
Published: (2013)

Rewarding in International Human Rights Law?
by: Siobhán McInerney-Lankford
Published: (2021-01-01)

STELA: a community-centred approach to norm elicitation for AI alignment
by: Stevie Bergman, et al.
Published: (2024-03-01)

Towards an AI Coach to Infer Team Mental Model Alignment in Healthcare
by: Seo, Sangwon, et al.
Published: (2022)

Affective neuroscience of pleasure: reward in humans and animals.
by: Berridge, K, et al.
Published: (2008)

Antidepressants and reward
by: McCabe, C, et al.
Published: (2012)

Temporally organized representations of reward and risk in the human brain
by: Vincent Man, et al.
Published: (2024-03-01)

Enhancing reward learning in the absence of an effect on reward
by: Browning, M
Published: (2023)

Methamphetamine activates reward circuitry in drug naive human subjects.
by: Völlm, B, et al.
Published: (2004)

AI-Enhanced Digital Creativity Design: Content-Style Alignment for Image Stylization
by: Lanting Yu, et al.
Published: (2023-01-01)

Hedging your bets by learning reward correlations in the human brain.
by: Wunderlich, K, et al.
Published: (2011)

Humans can identify reward-related call types of chickens
by: Nicky McGrath, et al.
Published: (2024-01-01)

Human responses to unfairness with primary rewards and their biological limits
by: Wright, N, et al.
Published: (2012)

Neural structure mapping in human probabilistic reward learning
by: Luyckx, F, et al.
Published: (2019)

Abstract reward and punishment representations in the human orbitofrontal cortex.
by: O'Doherty, J, et al.
Published: (2001)

The human orbitofrontal cortex: linking reward to hedonic experience.
by: Kringelbach, M
Published: (2005)

Stimulating human prefrontal cortex increases reward learning
by: Overman, MJ, et al.
Published: (2023)

Independent coding of reward magnitude and valence in the human brain.
by: Yeung, N, et al.
Published: (2004)

Human ventromedial prefrontal lesions alter incentivisation by reward
by: Manohar, S, et al.
Published: (2016)

Stimulating human prefrontal cortex increases reward learning
by: Margot Juliëtte Overman, et al.
Published: (2023-05-01)

Is our self nothing but reward? Neuronal overlap and distinction between reward and personal relevance and its relation to human personality.
by: Björn Enzi, et al.
Published: (2009-12-01)

XAIR: A Systematic Metareview of Explainable AI (XAI) Aligned to the Software Development Process
by: Tobias Clement, et al.
Published: (2023-01-01)

Exploring the Impact of AI Value Alignment in Collaborative Ideation: Effects on Perception, Ownership, and Output
by: Guo, Alicia
Published: (2024)

There's reward in work
by: Chandran, Sheela
Published: (2014)

Excellence rewarded
by: The, Star
Published: (2013)

The Rarity of Rewarding
by: Andrew Guzman, et al.
Published: (2021-01-01)

Sweet reward /
by: 525142 Reece, Christy
Published: (2011)

AI for Humans and Humans for AI: Towards Cultures of Participation in the Digital Age
by: Alessandro Pagano, et al.
Published: (2023-12-01)

Rewarded Meta-Pruning: Meta Learning with Rewards for Channel Pruning
by: Athul Shibu, et al.
Published: (2023-12-01)

It’s All about Reward: Contrasting Joint Rewards and Individual Reward in Centralized Learning Decentralized Execution Algorithms
by: Peter Atrazhev, et al.
Published: (2023-03-01)

2.1 A METHOD FOR THE MEASUREMENT OF PRESSURE SENSITIVITY OF CAROTID-FEMORAL PULSE WAVE VELOCITY IN HUMANS
by: Mark Butlin, et al.
Published: (2015-11-01)

Do humans produce the speed-accuracy trade-off that maximizes reward rate?
by: Bogacz, R, et al.
Published: (2010)