The benefits, risks and bounds of personalizing the alignment of large language models to individuals

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

Large language models (LLMs) undergo ‘alignment’ so that they better reflect human values or preferences, and are safer or more useful. However, alignment is intrinsically difficult because the hundreds of millions of people who now interact with LLMs have different preferences for language and conv...

Πλήρης περιγραφή

Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριοι συγγραφείς:	Kirk, HR, Vidgen, B, Röttger, P, Hale, SA
Μορφή:	Journal article
Γλώσσα:	English
Έκδοση:	Springer Nature 2024

Παρόμοια τεκμήρια

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
ανά: Kirk, HR, κ.ά.
Έκδοση: (2022)

Is more data better? re-thinking the importance of efficiency in abusive language detection with transformers-based active learning
ανά: Kirk, HR, κ.ά.
Έκδοση: (2022)

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
ανά: Kirk, H, κ.ά.
Έκδοση: (2021)

Exploring large language models for ontology alignment
ανά: He, Y, κ.ά.
Έκδοση: (2023)

Survey on large language models alignment research
ανά: LIU Kunlin, κ.ά.
Έκδοση: (2024-06-01)

Survey on large language models alignment research
ανά: LIU Kunlin, κ.ά.
Έκδοση: (2024-06-01)

Two contrasting data annotation paradigms for subjective NLP tasks
ανά: Röttger, P, κ.ά.
Έκδοση: (2022)

Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning
ανά: Ahmed, Z, κ.ά.
Έκδοση: (2022)

Auditing large language models: a three-layered approach
ανά: Mökander, J, κ.ά.
Έκδοση: (2023)

Strong and weak alignment of large language models with human values
ανά: Mehdi Khamassi, κ.ά.
Έκδοση: (2024-08-01)

Aligning, autoencoding and prompting large language models for novel disease reporting
ανά: Liu, F, κ.ά.
Έκδοση: (2025)

HateCheck: functional tests for hate speech detection models
ανά: Röttger, P, κ.ά.
Έκδοση: (2021)

Deciphering implicit hate: evaluating automated detection algorithms for multimodal hate
ανά: Botelho, A, κ.ά.
Έκδοση: (2021)

Improvements in viral gene annotation using large language models and soft alignments
ανά: William L. Harrigan, κ.ά.
Έκδοση: (2024-04-01)

Personality prediction based on large language models
ανά: Wee, Jewel Xin Yu
Έκδοση: (2024)

Dyslexia application using axis align bounding boxes (AABB) /
ανά: Awatif Baharuddin, 1993-, κ.ά.
Έκδοση: (2016)

Evaluating the ability of large language models to emulate personality
ανά: Yilei Wang, κ.ά.
Έκδοση: (2025-01-01)

Individualizing the risks and benefits of postmenopausal hormone therapy.
ανά: van Staa, T, κ.ά.
Έκδοση: (2008)

Directions in abusive language training data, a systematic review: Garbage in, garbage out.
ανά: Bertie Vidgen, κ.ά.
Έκδοση: (2020-01-01)

Large Language Model Augmentation and Feature Alignment Method for Few-Shot Continual Relation Extraction
ανά: LI Yifei, ZHANG Lingling, DONG Yuxuan, WANG Jiaxin, ZHONG Yujie, WEI Bifan
Έκδοση: (2024-09-01)

Personalized prediction of lifetime benefits with statin therapy for asymptomatic individuals: a modeling study.
ανά: Bart S Ferket, κ.ά.
Έκδοση: (2012-01-01)

Person re-identification based on large vision-language model
ανά: Ding, Songyu
Έκδοση: (2024)

Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study
ανά: Jae-hee So, κ.ά.
Έκδοση: (2024-10-01)

DrugReAlign: a multisource prompt framework for drug repurposing based on large language models
ανά: Jinhang Wei, κ.ά.
Έκδοση: (2024-10-01)

Benefits of spontaneous confidence alignment between dyad members
ανά: Pescetelli, N, κ.ά.
Έκδοση: (2022)

Hedgerow benefits align with food production and sustainability goals
ανά: R Long, κ.ά.
Έκδοση: (2017-09-01)

Hedgerow benefits align with food production and sustainability goals
ανά: Rachael F. Long, κ.ά.
Έκδοση: (2017-09-01)

Alignment-free sequence comparison: benefits, applications, and tools
ανά: Andrzej Zielezinski, κ.ά.
Έκδοση: (2017-10-01)

Cross-language Wikipedia editing of Okinawa, Japan
ανά: Hale, SA
Έκδοση: (2015)

Constructive alignment in a graduate-level project management course: an innovative framework using large language models
ανά: Estacio Pereira, κ.ά.
Έκδοση: (2024-04-01)

Affine Matching with Bounded Sensor Error: A Study of Geometric Hashing and Alignment
ανά: Grimson W. Eric L., κ.ά.
Έκδοση: (2004)

An upper bound on the convergence rate of a second functional in optimal sequence alignment
ανά: Hauser, R, κ.ά.
Έκδοση: (2017)

Dyslexia application using axis align bounding boxes (AABB) [electronic resource] /
ανά: Awatif Baharuddin, 1993-, κ.ά.
Έκδοση: (2016)

Outward Bound as education for personal growth
ανά: Katz, Richard, κ.ά.
Έκδοση: (2009)

The Existential Truth: On Personal beyond Bounds
ανά: K. S. Golikov
Έκδοση: (2020-11-01)

Aligning subtitles in sign language videos
ανά: Bull, H, κ.ά.
Έκδοση: (2022)

Morphological Alignment in Khorramabad Lori Language
ανά: Fatemeh Akoondi
Έκδοση: (2023-03-01)

Aligning English Language Testing With Curriculum
ανά: Marcela Palacio, κ.ά.
Έκδοση: (2016-07-01)

The Opportunities and Risks of Large Language Models in Mental Health
ανά: Hannah R Lawrence, κ.ά.
Έκδοση: (2024-07-01)

The Dengue Vaccine Dilemma: Balancing the Individual and Population Risks and Benefits.
ανά: Jacqueline Deen
Έκδοση: (2016-11-01)