The benefits, risks and bounds of personalizing the alignment of large language models to individuals

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

Large language models (LLMs) undergo ‘alignment’ so that they better reflect human values or preferences, and are safer or more useful. However, alignment is intrinsically difficult because the hundreds of millions of people who now interact with LLMs have different preferences for language and conv...

Ամբողջական նկարագրություն

Մատենագիտական մանրամասներ
Հիմնական հեղինակներ:	Kirk, HR, Vidgen, B, Röttger, P, Hale, SA
Ձևաչափ:	Journal article
Լեզու:	English
Հրապարակվել է:	Springer Nature 2024

Նմանատիպ նյութեր

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
‌: Kirk, HR, և այլն
Հրապարակվել է: (2022)

Is more data better? re-thinking the importance of efficiency in abusive language detection with transformers-based active learning
‌: Kirk, HR, և այլն
Հրապարակվել է: (2022)

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
‌: Kirk, H, և այլն
Հրապարակվել է: (2021)

Exploring large language models for ontology alignment
‌: He, Y, և այլն
Հրապարակվել է: (2023)

Survey on large language models alignment research
‌: LIU Kunlin, և այլն
Հրապարակվել է: (2024-06-01)

Survey on large language models alignment research
‌: LIU Kunlin, և այլն
Հրապարակվել է: (2024-06-01)

Two contrasting data annotation paradigms for subjective NLP tasks
‌: Röttger, P, և այլն
Հրապարակվել է: (2022)

Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning
‌: Ahmed, Z, և այլն
Հրապարակվել է: (2022)

Auditing large language models: a three-layered approach
‌: Mökander, J, և այլն
Հրապարակվել է: (2023)

Strong and weak alignment of large language models with human values
‌: Mehdi Khamassi, և այլն
Հրապարակվել է: (2024-08-01)

Aligning, autoencoding and prompting large language models for novel disease reporting
‌: Liu, F, և այլն
Հրապարակվել է: (2025)

HateCheck: functional tests for hate speech detection models
‌: Röttger, P, և այլն
Հրապարակվել է: (2021)

Deciphering implicit hate: evaluating automated detection algorithms for multimodal hate
‌: Botelho, A, և այլն
Հրապարակվել է: (2021)

Lower bounds on multiple sequence alignment using exact 3-way alignment
‌: Colbourn Charles J, և այլն
Հրապարակվել է: (2007-04-01)

Improvements in viral gene annotation using large language models and soft alignments
‌: William L. Harrigan, և այլն
Հրապարակվել է: (2024-04-01)

The Health Star Rating system – is its reductionist (nutrient) approach a benefit or risk for tackling dietary risk factors?
‌: Mark A Lawrence, և այլն
Հրապարակվել է: (2019-03-01)

Personality prediction based on large language models
‌: Wee, Jewel Xin Yu
Հրապարակվել է: (2024)

Dyslexia application using axis align bounding boxes (AABB) /
‌: Awatif Baharuddin, 1993-, և այլն
Հրապարակվել է: (2016)

Evaluating the ability of large language models to emulate personality
‌: Yilei Wang, և այլն
Հրապարակվել է: (2025-01-01)

Directions in abusive language training data, a systematic review: Garbage in, garbage out.
‌: Bertie Vidgen, և այլն
Հրապարակվել է: (2020-01-01)

Individualizing the risks and benefits of postmenopausal hormone therapy.
‌: van Staa, T, և այլն
Հրապարակվել է: (2008)

Large Language Model Augmentation and Feature Alignment Method for Few-Shot Continual Relation Extraction
‌: LI Yifei, ZHANG Lingling, DONG Yuxuan, WANG Jiaxin, ZHONG Yujie, WEI Bifan
Հրապարակվել է: (2024-09-01)

Reflection in Speech of the Individual-Typological Features of Language Personality
‌: Наталія Фоміна
Հրապարակվել է: (2019-11-01)

Benefit-Risk Analysis of Buprenorphine for Pain Management
‌: Hale M, և այլն
Հրապարակվել է: (2021-05-01)

Personalized prediction of lifetime benefits with statin therapy for asymptomatic individuals: a modeling study.
‌: Bart S Ferket, և այլն
Հրապարակվել է: (2012-01-01)

Person re-identification based on large vision-language model
‌: Ding, Songyu
Հրապարակվել է: (2024)

Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study
‌: Jae-hee So, և այլն
Հրապարակվել է: (2024-10-01)

DrugReAlign: a multisource prompt framework for drug repurposing based on large language models
‌: Jinhang Wei, և այլն
Հրապարակվել է: (2024-10-01)

Benefits of spontaneous confidence alignment between dyad members
‌: Pescetelli, N, և այլն
Հրապարակվել է: (2022)

Hedgerow benefits align with food production and sustainability goals
‌: R Long, և այլն
Հրապարակվել է: (2017-09-01)

Hedgerow benefits align with food production and sustainability goals
‌: Rachael F. Long, և այլն
Հրապարակվել է: (2017-09-01)

Alignment-free sequence comparison: benefits, applications, and tools
‌: Andrzej Zielezinski, և այլն
Հրապարակվել է: (2017-10-01)

Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning
‌: Zo Ahmed, և այլն
Հրապարակվել է: (2022-02-01)

Cross-language Wikipedia editing of Okinawa, Japan
‌: Hale, SA
Հրապարակվել է: (2015)

Constructive alignment in a graduate-level project management course: an innovative framework using large language models
‌: Estacio Pereira, և այլն
Հրապարակվել է: (2024-04-01)

Affine Matching with Bounded Sensor Error: A Study of Geometric Hashing and Alignment
‌: Grimson W. Eric L., և այլն
Հրապարակվել է: (2004)

An upper bound on the convergence rate of a second functional in optimal sequence alignment
‌: Hauser, R, և այլն
Հրապարակվել է: (2017)

Dyslexia application using axis align bounding boxes (AABB) [electronic resource] /
‌: Awatif Baharuddin, 1993-, և այլն
Հրապարակվել է: (2016)

Outward Bound as education for personal growth
‌: Katz, Richard, և այլն
Հրապարակվել է: (2009)

The Existential Truth: On Personal beyond Bounds
‌: K. S. Golikov
Հրապարակվել է: (2020-11-01)