The benefits, risks and bounds of personalizing the alignment of large language models to individuals

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

Large language models (LLMs) undergo ‘alignment’ so that they better reflect human values or preferences, and are safer or more useful. However, alignment is intrinsically difficult because the hundreds of millions of people who now interact with LLMs have different preferences for language and conv...

Deskribapen osoa

Xehetasun bibliografikoak
Egile Nagusiak:	Kirk, HR, Vidgen, B, Röttger, P, Hale, SA
Formatua:	Journal article
Hizkuntza:	English
Argitaratua:	Springer Nature 2024

Antzeko izenburuak

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
nork: Kirk, HR, et al.
Argitaratua: (2022)

Is more data better? re-thinking the importance of efficiency in abusive language detection with transformers-based active learning
nork: Kirk, HR, et al.
Argitaratua: (2022)

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
nork: Kirk, H, et al.
Argitaratua: (2021)

Exploring large language models for ontology alignment
nork: He, Y, et al.
Argitaratua: (2023)

Survey on large language models alignment research
nork: LIU Kunlin, et al.
Argitaratua: (2024-06-01)

Survey on large language models alignment research
nork: LIU Kunlin, et al.
Argitaratua: (2024-06-01)

Two contrasting data annotation paradigms for subjective NLP tasks
nork: Röttger, P, et al.
Argitaratua: (2022)

Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning
nork: Ahmed, Z, et al.
Argitaratua: (2022)

Auditing large language models: a three-layered approach
nork: Mökander, J, et al.
Argitaratua: (2023)

Strong and weak alignment of large language models with human values
nork: Mehdi Khamassi, et al.
Argitaratua: (2024-08-01)

Aligning, autoencoding and prompting large language models for novel disease reporting
nork: Liu, F, et al.
Argitaratua: (2025)

HateCheck: functional tests for hate speech detection models
nork: Röttger, P, et al.
Argitaratua: (2021)

Deciphering implicit hate: evaluating automated detection algorithms for multimodal hate
nork: Botelho, A, et al.
Argitaratua: (2021)

Lower bounds on multiple sequence alignment using exact 3-way alignment
nork: Colbourn Charles J, et al.
Argitaratua: (2007-04-01)

Improvements in viral gene annotation using large language models and soft alignments
nork: William L. Harrigan, et al.
Argitaratua: (2024-04-01)

The Health Star Rating system – is its reductionist (nutrient) approach a benefit or risk for tackling dietary risk factors?
nork: Mark A Lawrence, et al.
Argitaratua: (2019-03-01)

Personality prediction based on large language models
nork: Wee, Jewel Xin Yu
Argitaratua: (2024)

Dyslexia application using axis align bounding boxes (AABB) /
nork: Awatif Baharuddin, 1993-, et al.
Argitaratua: (2016)

Evaluating the ability of large language models to emulate personality
nork: Yilei Wang, et al.
Argitaratua: (2025-01-01)

Directions in abusive language training data, a systematic review: Garbage in, garbage out.
nork: Bertie Vidgen, et al.
Argitaratua: (2020-01-01)

Individualizing the risks and benefits of postmenopausal hormone therapy.
nork: van Staa, T, et al.
Argitaratua: (2008)

Large Language Model Augmentation and Feature Alignment Method for Few-Shot Continual Relation Extraction
nork: LI Yifei, ZHANG Lingling, DONG Yuxuan, WANG Jiaxin, ZHONG Yujie, WEI Bifan
Argitaratua: (2024-09-01)

Reflection in Speech of the Individual-Typological Features of Language Personality
nork: Наталія Фоміна
Argitaratua: (2019-11-01)

Benefit-Risk Analysis of Buprenorphine for Pain Management
nork: Hale M, et al.
Argitaratua: (2021-05-01)

Personalized prediction of lifetime benefits with statin therapy for asymptomatic individuals: a modeling study.
nork: Bart S Ferket, et al.
Argitaratua: (2012-01-01)

Person re-identification based on large vision-language model
nork: Ding, Songyu
Argitaratua: (2024)

Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study
nork: Jae-hee So, et al.
Argitaratua: (2024-10-01)

DrugReAlign: a multisource prompt framework for drug repurposing based on large language models
nork: Jinhang Wei, et al.
Argitaratua: (2024-10-01)

Benefits of spontaneous confidence alignment between dyad members
nork: Pescetelli, N, et al.
Argitaratua: (2022)

Hedgerow benefits align with food production and sustainability goals
nork: R Long, et al.
Argitaratua: (2017-09-01)

Hedgerow benefits align with food production and sustainability goals
nork: Rachael F. Long, et al.
Argitaratua: (2017-09-01)

Alignment-free sequence comparison: benefits, applications, and tools
nork: Andrzej Zielezinski, et al.
Argitaratua: (2017-10-01)

Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning
nork: Zo Ahmed, et al.
Argitaratua: (2022-02-01)

Cross-language Wikipedia editing of Okinawa, Japan
nork: Hale, SA
Argitaratua: (2015)

Constructive alignment in a graduate-level project management course: an innovative framework using large language models
nork: Estacio Pereira, et al.
Argitaratua: (2024-04-01)

Affine Matching with Bounded Sensor Error: A Study of Geometric Hashing and Alignment
nork: Grimson W. Eric L., et al.
Argitaratua: (2004)

An upper bound on the convergence rate of a second functional in optimal sequence alignment
nork: Hauser, R, et al.
Argitaratua: (2017)

Dyslexia application using axis align bounding boxes (AABB) [electronic resource] /
nork: Awatif Baharuddin, 1993-, et al.
Argitaratua: (2016)

Outward Bound as education for personal growth
nork: Katz, Richard, et al.
Argitaratua: (2009)

The Existential Truth: On Personal beyond Bounds
nork: K. S. Golikov
Argitaratua: (2020-11-01)