The benefits, risks and bounds of personalizing the alignment of large language models to individuals

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

Large language models (LLMs) undergo ‘alignment’ so that they better reflect human values or preferences, and are safer or more useful. However, alignment is intrinsically difficult because the hundreds of millions of people who now interact with LLMs have different preferences for language and conv...

ver descrição completa

Detalhes bibliográficos
Main Authors:	Kirk, HR, Vidgen, B, Röttger, P, Hale, SA
Formato:	Journal article
Idioma:	English
Publicado em:	Springer Nature 2024

Registos relacionados

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
Por: Kirk, HR, et al.
Publicado em: (2022)

Is more data better? re-thinking the importance of efficiency in abusive language detection with transformers-based active learning
Por: Kirk, HR, et al.
Publicado em: (2022)

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
Por: Kirk, H, et al.
Publicado em: (2021)

Exploring large language models for ontology alignment
Por: He, Y, et al.
Publicado em: (2023)

Survey on large language models alignment research
Por: LIU Kunlin, et al.
Publicado em: (2024-06-01)

Survey on large language models alignment research
Por: LIU Kunlin, et al.
Publicado em: (2024-06-01)

Two contrasting data annotation paradigms for subjective NLP tasks
Por: Röttger, P, et al.
Publicado em: (2022)

Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning
Por: Ahmed, Z, et al.
Publicado em: (2022)

Auditing large language models: a three-layered approach
Por: Mökander, J, et al.
Publicado em: (2023)

Strong and weak alignment of large language models with human values
Por: Mehdi Khamassi, et al.
Publicado em: (2024-08-01)

Aligning, autoencoding and prompting large language models for novel disease reporting
Por: Liu, F, et al.
Publicado em: (2025)

HateCheck: functional tests for hate speech detection models
Por: Röttger, P, et al.
Publicado em: (2021)

Deciphering implicit hate: evaluating automated detection algorithms for multimodal hate
Por: Botelho, A, et al.
Publicado em: (2021)

Lower bounds on multiple sequence alignment using exact 3-way alignment
Por: Colbourn Charles J, et al.
Publicado em: (2007-04-01)

Improvements in viral gene annotation using large language models and soft alignments
Por: William L. Harrigan, et al.
Publicado em: (2024-04-01)

The Health Star Rating system – is its reductionist (nutrient) approach a benefit or risk for tackling dietary risk factors?
Por: Mark A Lawrence, et al.
Publicado em: (2019-03-01)

Personality prediction based on large language models
Por: Wee, Jewel Xin Yu
Publicado em: (2024)

Dyslexia application using axis align bounding boxes (AABB) /
Por: Awatif Baharuddin, 1993-, et al.
Publicado em: (2016)

Evaluating the ability of large language models to emulate personality
Por: Yilei Wang, et al.
Publicado em: (2025-01-01)

Directions in abusive language training data, a systematic review: Garbage in, garbage out.
Por: Bertie Vidgen, et al.
Publicado em: (2020-01-01)

Individualizing the risks and benefits of postmenopausal hormone therapy.
Por: van Staa, T, et al.
Publicado em: (2008)

Large Language Model Augmentation and Feature Alignment Method for Few-Shot Continual Relation Extraction
Por: LI Yifei, ZHANG Lingling, DONG Yuxuan, WANG Jiaxin, ZHONG Yujie, WEI Bifan
Publicado em: (2024-09-01)

Reflection in Speech of the Individual-Typological Features of Language Personality
Por: Наталія Фоміна
Publicado em: (2019-11-01)

Benefit-Risk Analysis of Buprenorphine for Pain Management
Por: Hale M, et al.
Publicado em: (2021-05-01)

Personalized prediction of lifetime benefits with statin therapy for asymptomatic individuals: a modeling study.
Por: Bart S Ferket, et al.
Publicado em: (2012-01-01)

Person re-identification based on large vision-language model
Por: Ding, Songyu
Publicado em: (2024)

Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study
Por: Jae-hee So, et al.
Publicado em: (2024-10-01)

DrugReAlign: a multisource prompt framework for drug repurposing based on large language models
Por: Jinhang Wei, et al.
Publicado em: (2024-10-01)

Benefits of spontaneous confidence alignment between dyad members
Por: Pescetelli, N, et al.
Publicado em: (2022)

Hedgerow benefits align with food production and sustainability goals
Por: R Long, et al.
Publicado em: (2017-09-01)

Hedgerow benefits align with food production and sustainability goals
Por: Rachael F. Long, et al.
Publicado em: (2017-09-01)

Alignment-free sequence comparison: benefits, applications, and tools
Por: Andrzej Zielezinski, et al.
Publicado em: (2017-10-01)

Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning
Por: Zo Ahmed, et al.
Publicado em: (2022-02-01)

Cross-language Wikipedia editing of Okinawa, Japan
Por: Hale, SA
Publicado em: (2015)

Constructive alignment in a graduate-level project management course: an innovative framework using large language models
Por: Estacio Pereira, et al.
Publicado em: (2024-04-01)

Affine Matching with Bounded Sensor Error: A Study of Geometric Hashing and Alignment
Por: Grimson W. Eric L., et al.
Publicado em: (2004)

An upper bound on the convergence rate of a second functional in optimal sequence alignment
Por: Hauser, R, et al.
Publicado em: (2017)

Dyslexia application using axis align bounding boxes (AABB) [electronic resource] /
Por: Awatif Baharuddin, 1993-, et al.
Publicado em: (2016)

Outward Bound as education for personal growth
Por: Katz, Richard, et al.
Publicado em: (2009)

The Existential Truth: On Personal beyond Bounds
Por: K. S. Golikov
Publicado em: (2020-11-01)