Model merging and safety alignment: one bad model spoils the bunch

Model merging and safety alignment: one bad model spoils the bunch

Merging Large Language Models (LLMs) is a cost-effective technique for combining multiple expert LLMs into a single versatile model, retaining the expertise of the original ones. However, current approaches often overlook the importance of safety alignment during merging, leading to highly misaligne...

Full description

Bibliographic Details
Main Authors:	Hammoud, HAAK, Michieli, U, Pizzati, F, Torr, P, Bibi, A, Ghanem, B, Ozay, M
Format:	Conference item
Language:	English
Published:	Association for Computational Linguistics 2024

Similar Items

Bi-factorial preference optimization: balancing safety-helpfulness in language models
by: Zhang, W, et al.
Published: (2025)

Merging of gold and semiconductor at nanoscale
by: Kamysbayev Vladislav
Published: (2014)

Turbulent mixing of merging plumes
by: Lee, Selina Chien Cheah.
Published: (2008)

Merge : the hybridization of architecture, infrastructure, and landscape
by: Fausto, Ariel
Published: (2007)

Atomic Habits : Tiny Changes, Remarkable Results : An Easy & Proven Way to Build Good Habits & Break Bad Ones /
by: Clear, James, author 656048
Published: (2018)

Atomic Habits : Tiny Changes, Remarkable Results : An Easy & Proven Way to Build Good Habits & Break Bad Ones /
by: Clear, James, author 656048
Published: ([201)

Employee turnover : bad attitude or poor management?
by: Khadijah Abdul Rahman., et al.
Published: (2008)

Magnetic Droplet Merging by Hybrid Magnetic Fields
by: Ray, Ayan, et al.
Published: (2016)

Can large language model agents simulate human trust behaviors?
by: Xie, C, et al.
Published: (2024)

Universal in-context approximation by prompting fully recurrent models
by: Petrov, A, et al.
Published: (2025)

BadSFL: backdoor attack in scaffold federated learning
by: Zhang, Xuanye
Published: (2024)

Conceptual process design and safety analysis of hydrogen production by using oil palm empty fruit bunch /
by: Siti Suhaili Shahlan, 1988-, author 655844, et al.
Published: (2023)

Conceptual process design and safety analysis of hydrogen production by using oil palm empty fruit bunch /
by: Siti Suhaili Shahlan, 1988-, author 655844
Published: (2023)

Modeling of snapping composite shells with magnetically aligned bio-inspired reinforcements
by: Riley, Katherine S., et al.
Published: (2020)

Aligning, autoencoding and prompting large language models for novel disease reporting
by: Liu, F, et al.
Published: (2025)

Object detection under bad lighting condition for autonomous vehicles for rain images
by: Cai, Ziqiang
Published: (2022)

Merging mass and interpersonal communication via interactive communication technology : a symposium
by: Valkenburg, Patti M., et al.
Published: (2019)

Development of kinetics model for torrefaction of empty fruit bunch from palm oil waste
by: Nur Hazirah Huda, Mohd Harun, et al.
Published: (2017)

Studies for single bunch and multi-bunch beam instabilities in the Diamond-II booster
by: Husain, R, et al.
Published: (2024)

Bad touch theme park and other stories : a collection of short stories and an exegesis
by: Nurulhuda Mohammed Arslan
Published: (2018)

Population of Merging Compact Binaries Inferred Using Gravitational Waves through GWTC-3
by: Sudhir, Vivishek
Published: (2024)

Animatezoo: zero-shot video generation of cross-species animation via subject alignment
by: Xu, Y, et al.
Published: (2024)

Safety assessment using computer experiments and surrogate modeling: Railway vehicle safety and track quality indices
by: Neves Costa, João, et al.
Published: (2024)

Investigation and solution of bus bunching problem
by: Tan, Chun Howe
Published: (2014)

Bad Mexican : portrayal of corruption and the drug trade in Man on Fire, Traffic, and La Ley de Herodes.
by: Goh, Dianne.
Published: (2011)

Aligning Human and Robot Representations
by: Bobu, Andreea, et al.
Published: (2024)

Beyond Preferences in AI Alignment
by: Zhi-Xuan, Tan, et al.
Published: (2024)

Dynamic Expansion and Merging of the Equatorial Ionization Anomaly During the 10–11 May 2024 Super Geomagnetic Storm
by: Aa, Ercha, et al.
Published: (2024)

Filamentation of a relativistic proton bunch in plasma
by: Verra, L, et al.
Published: (2024)

One-stop measurement model for fast and accurate tensor display characterization
by: Surman, Phil, et al.
Published: (2019)

Growth of horizontally aligned carbon nanotubes
by: Chee, Kin Onn.
Published: (2011)

Finite element analysis of dental aligners
by: Tan, Daniel Jing Quan
Published: (2019)

Exfoliation and alignment of h-BN for thermal application
by: Teo, Jing Xiang
Published: (2022)

Photovoltaic module temperature estimation model for the one-time-point daily estimation method.
by: Kinfatt, Wong, et al.
Published: (2024)

Validating the Safety Culture Model in Manufacturing Small and Medium Enterprises (SMEs) in Malaysia (S/O: 12365)
by: Hassan, Zuraida, et al.
Published: (2019)

Towards interpretable sequence continuation: analyzing shared circuits in large language models
by: Lan, M, et al.
Published: (2024)

Polymer composites with aligned carbon nanotubes for high performance
by: Lau, Priscilla Shi Wan.
Published: (2011)

Fabrication and investigation of multi-layered preferentially aligned structures
by: Chua, Chalmers Jun Yang
Published: (2023)

Non-alignment : cold war relic or enduring institution?
by: Liow, John Boon Peng.
Published: (2008)

PPVC alignment and load monitoring in connectors during installation
by: Ngo, Xing Yi
Published: (2022)