The benefits, risks and bounds of personalizing the alignment of large language models to individuals

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

Large language models (LLMs) undergo ‘alignment’ so that they better reflect human values or preferences, and are safer or more useful. However, alignment is intrinsically difficult because the hundreds of millions of people who now interact with LLMs have different preferences for language and conv...

সম্পূর্ণ বিবরণ

গ্রন্থ-পঞ্জীর বিবরন
প্রধান লেখক:	Kirk, HR, Vidgen, B, Röttger, P, Hale, SA
বিন্যাস:	Journal article
ভাষা:	English
প্রকাশিত:	Springer Nature 2024

অনুরূপ উপাদানগুলি

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
অনুযায়ী: Kirk, HR, অন্যান্য
প্রকাশিত: (2022)

Is more data better? re-thinking the importance of efficiency in abusive language detection with transformers-based active learning
অনুযায়ী: Kirk, HR, অন্যান্য
প্রকাশিত: (2022)

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
অনুযায়ী: Kirk, H, অন্যান্য
প্রকাশিত: (2021)

Exploring large language models for ontology alignment
অনুযায়ী: He, Y, অন্যান্য
প্রকাশিত: (2023)

Survey on large language models alignment research
অনুযায়ী: LIU Kunlin, অন্যান্য
প্রকাশিত: (2024-06-01)

Survey on large language models alignment research
অনুযায়ী: LIU Kunlin, অন্যান্য
প্রকাশিত: (2024-06-01)

Two contrasting data annotation paradigms for subjective NLP tasks
অনুযায়ী: Röttger, P, অন্যান্য
প্রকাশিত: (2022)

Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning
অনুযায়ী: Ahmed, Z, অন্যান্য
প্রকাশিত: (2022)

Auditing large language models: a three-layered approach
অনুযায়ী: Mökander, J, অন্যান্য
প্রকাশিত: (2023)

Strong and weak alignment of large language models with human values
অনুযায়ী: Mehdi Khamassi, অন্যান্য
প্রকাশিত: (2024-08-01)

Aligning, autoencoding and prompting large language models for novel disease reporting
অনুযায়ী: Liu, F, অন্যান্য
প্রকাশিত: (2025)

HateCheck: functional tests for hate speech detection models
অনুযায়ী: Röttger, P, অন্যান্য
প্রকাশিত: (2021)

Deciphering implicit hate: evaluating automated detection algorithms for multimodal hate
অনুযায়ী: Botelho, A, অন্যান্য
প্রকাশিত: (2021)

Improvements in viral gene annotation using large language models and soft alignments
অনুযায়ী: William L. Harrigan, অন্যান্য
প্রকাশিত: (2024-04-01)

Personality prediction based on large language models
অনুযায়ী: Wee, Jewel Xin Yu
প্রকাশিত: (2024)

Dyslexia application using axis align bounding boxes (AABB) /
অনুযায়ী: Awatif Baharuddin, 1993-, অন্যান্য
প্রকাশিত: (2016)

Evaluating the ability of large language models to emulate personality
অনুযায়ী: Yilei Wang, অন্যান্য
প্রকাশিত: (2025-01-01)

Individualizing the risks and benefits of postmenopausal hormone therapy.
অনুযায়ী: van Staa, T, অন্যান্য
প্রকাশিত: (2008)

Directions in abusive language training data, a systematic review: Garbage in, garbage out.
অনুযায়ী: Bertie Vidgen, অন্যান্য
প্রকাশিত: (2020-01-01)

Large Language Model Augmentation and Feature Alignment Method for Few-Shot Continual Relation Extraction
অনুযায়ী: LI Yifei, ZHANG Lingling, DONG Yuxuan, WANG Jiaxin, ZHONG Yujie, WEI Bifan
প্রকাশিত: (2024-09-01)

Personalized prediction of lifetime benefits with statin therapy for asymptomatic individuals: a modeling study.
অনুযায়ী: Bart S Ferket, অন্যান্য
প্রকাশিত: (2012-01-01)

Person re-identification based on large vision-language model
অনুযায়ী: Ding, Songyu
প্রকাশিত: (2024)

Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study
অনুযায়ী: Jae-hee So, অন্যান্য
প্রকাশিত: (2024-10-01)

DrugReAlign: a multisource prompt framework for drug repurposing based on large language models
অনুযায়ী: Jinhang Wei, অন্যান্য
প্রকাশিত: (2024-10-01)

Benefits of spontaneous confidence alignment between dyad members
অনুযায়ী: Pescetelli, N, অন্যান্য
প্রকাশিত: (2022)

Hedgerow benefits align with food production and sustainability goals
অনুযায়ী: R Long, অন্যান্য
প্রকাশিত: (2017-09-01)

Hedgerow benefits align with food production and sustainability goals
অনুযায়ী: Rachael F. Long, অন্যান্য
প্রকাশিত: (2017-09-01)

Alignment-free sequence comparison: benefits, applications, and tools
অনুযায়ী: Andrzej Zielezinski, অন্যান্য
প্রকাশিত: (2017-10-01)

Cross-language Wikipedia editing of Okinawa, Japan
অনুযায়ী: Hale, SA
প্রকাশিত: (2015)

Constructive alignment in a graduate-level project management course: an innovative framework using large language models
অনুযায়ী: Estacio Pereira, অন্যান্য
প্রকাশিত: (2024-04-01)

Affine Matching with Bounded Sensor Error: A Study of Geometric Hashing and Alignment
অনুযায়ী: Grimson W. Eric L., অন্যান্য
প্রকাশিত: (2004)

An upper bound on the convergence rate of a second functional in optimal sequence alignment
অনুযায়ী: Hauser, R, অন্যান্য
প্রকাশিত: (2017)

Dyslexia application using axis align bounding boxes (AABB) [electronic resource] /
অনুযায়ী: Awatif Baharuddin, 1993-, অন্যান্য
প্রকাশিত: (2016)

Outward Bound as education for personal growth
অনুযায়ী: Katz, Richard, অন্যান্য
প্রকাশিত: (2009)

The Existential Truth: On Personal beyond Bounds
অনুযায়ী: K. S. Golikov
প্রকাশিত: (2020-11-01)

Aligning subtitles in sign language videos
অনুযায়ী: Bull, H, অন্যান্য
প্রকাশিত: (2022)

Morphological Alignment in Khorramabad Lori Language
অনুযায়ী: Fatemeh Akoondi
প্রকাশিত: (2023-03-01)

Aligning English Language Testing With Curriculum
অনুযায়ী: Marcela Palacio, অন্যান্য
প্রকাশিত: (2016-07-01)

The Opportunities and Risks of Large Language Models in Mental Health
অনুযায়ী: Hannah R Lawrence, অন্যান্য
প্রকাশিত: (2024-07-01)

Impact of platform design on cross-language information exchange
অনুযায়ী: Hale, SA
প্রকাশিত: (2012)