The benefits, risks and bounds of personalizing the alignment of large language models to individuals

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

Large language models (LLMs) undergo ‘alignment’ so that they better reflect human values or preferences, and are safer or more useful. However, alignment is intrinsically difficult because the hundreds of millions of people who now interact with LLMs have different preferences for language and conv...

Bibliografski detalji
Glavni autori:	Kirk, HR, Vidgen, B, Röttger, P, Hale, SA
Format:	Journal article
Jezik:	English
Izdano:	Springer Nature 2024

Slični predmeti

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
od: Kirk, HR, i dr.
Izdano: (2022)

Is more data better? re-thinking the importance of efficiency in abusive language detection with transformers-based active learning
od: Kirk, HR, i dr.
Izdano: (2022)

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
od: Kirk, H, i dr.
Izdano: (2021)

Exploring large language models for ontology alignment
od: He, Y, i dr.
Izdano: (2023)

Survey on large language models alignment research
od: LIU Kunlin, i dr.
Izdano: (2024-06-01)

Survey on large language models alignment research
od: LIU Kunlin, i dr.
Izdano: (2024-06-01)

Two contrasting data annotation paradigms for subjective NLP tasks
od: Röttger, P, i dr.
Izdano: (2022)

Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning
od: Ahmed, Z, i dr.
Izdano: (2022)

Auditing large language models: a three-layered approach
od: Mökander, J, i dr.
Izdano: (2023)

Strong and weak alignment of large language models with human values
od: Mehdi Khamassi, i dr.
Izdano: (2024-08-01)

Aligning, autoencoding and prompting large language models for novel disease reporting
od: Liu, F, i dr.
Izdano: (2025)

HateCheck: functional tests for hate speech detection models
od: Röttger, P, i dr.
Izdano: (2021)

Deciphering implicit hate: evaluating automated detection algorithms for multimodal hate
od: Botelho, A, i dr.
Izdano: (2021)

Improvements in viral gene annotation using large language models and soft alignments
od: William L. Harrigan, i dr.
Izdano: (2024-04-01)

Personality prediction based on large language models
od: Wee, Jewel Xin Yu
Izdano: (2024)

Dyslexia application using axis align bounding boxes (AABB) /
od: Awatif Baharuddin, 1993-, i dr.
Izdano: (2016)

Individualizing the risks and benefits of postmenopausal hormone therapy.
od: van Staa, T, i dr.
Izdano: (2008)

Evaluating the ability of large language models to emulate personality
od: Yilei Wang, i dr.
Izdano: (2025-01-01)

Directions in abusive language training data, a systematic review: Garbage in, garbage out.
od: Bertie Vidgen, i dr.
Izdano: (2020-01-01)

Large Language Model Augmentation and Feature Alignment Method for Few-Shot Continual Relation Extraction
od: LI Yifei, ZHANG Lingling, DONG Yuxuan, WANG Jiaxin, ZHONG Yujie, WEI Bifan
Izdano: (2024-09-01)

Personalized prediction of lifetime benefits with statin therapy for asymptomatic individuals: a modeling study.
od: Bart S Ferket, i dr.
Izdano: (2012-01-01)

Person re-identification based on large vision-language model
od: Ding, Songyu
Izdano: (2024)

Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study
od: Jae-hee So, i dr.
Izdano: (2024-10-01)

DrugReAlign: a multisource prompt framework for drug repurposing based on large language models
od: Jinhang Wei, i dr.
Izdano: (2024-10-01)

Benefits of spontaneous confidence alignment between dyad members
od: Pescetelli, N, i dr.
Izdano: (2022)

Hedgerow benefits align with food production and sustainability goals
od: R Long, i dr.
Izdano: (2017-09-01)

Hedgerow benefits align with food production and sustainability goals
od: Rachael F. Long, i dr.
Izdano: (2017-09-01)

Alignment-free sequence comparison: benefits, applications, and tools
od: Andrzej Zielezinski, i dr.
Izdano: (2017-10-01)

Cross-language Wikipedia editing of Okinawa, Japan
od: Hale, SA
Izdano: (2015)

Constructive alignment in a graduate-level project management course: an innovative framework using large language models
od: Estacio Pereira, i dr.
Izdano: (2024-04-01)

Affine Matching with Bounded Sensor Error: A Study of Geometric Hashing and Alignment
od: Grimson W. Eric L., i dr.
Izdano: (2004)

An upper bound on the convergence rate of a second functional in optimal sequence alignment
od: Hauser, R, i dr.
Izdano: (2017)

Dyslexia application using axis align bounding boxes (AABB) [electronic resource] /
od: Awatif Baharuddin, 1993-, i dr.
Izdano: (2016)

Outward Bound as education for personal growth
od: Katz, Richard, i dr.
Izdano: (2009)

The Existential Truth: On Personal beyond Bounds
od: K. S. Golikov
Izdano: (2020-11-01)

The Dengue Vaccine Dilemma: Balancing the Individual and Population Risks and Benefits.
od: Jacqueline Deen
Izdano: (2016-11-01)

Aligning subtitles in sign language videos
od: Bull, H, i dr.
Izdano: (2022)

Morphological Alignment in Khorramabad Lori Language
od: Fatemeh Akoondi
Izdano: (2023-03-01)

Aligning English Language Testing With Curriculum
od: Marcela Palacio, i dr.
Izdano: (2016-07-01)

The Opportunities and Risks of Large Language Models in Mental Health
od: Hannah R Lawrence, i dr.
Izdano: (2024-07-01)