ML Tea: Activation-Informed Merging of LLMs

Speaker: Kaveh Alimohammadi

Title: Activation-Informed Merging of LLMs

Abstract: Model merging has emerged as an efficient strategy for combining multiple fine-tuned large language models (LLMs) while avoiding the computational overhead of retraining. However, existing methods often overlook the importance of activation-space information in guiding the merging process. In this talk, I will introduce Activation-Informed Merging (AIM), a novel technique that enhances the robustness and performance of merged models by incorporating activation-space insights. AIM is designed as a complementary framework that can be applied to any merging approach, preserving critical weights from the base model through principles drawn from continual learning and model compression. By utilizing a task-agnostic calibration set, AIM selectively prioritizes essential parameters, leading to significant performance improvements across multiple benchmarks, with up to a 40% increase in effectiveness.

Add to Calendar 2025-04-07 16:00:00 2025-04-07 17:00:00 America/New_York ML Tea: Activation-Informed Merging of LLMs Speaker: Kaveh AlimohammadiTitle: Activation-Informed Merging of LLMsAbstract: Model merging has emerged as an efficient strategy for combining multiple fine-tuned large language models (LLMs) while avoiding the computational overhead of retraining. However, existing methods often overlook the importance of activation-space information in guiding the merging process. In this talk, I will introduce Activation-Informed Merging (AIM), a novel technique that enhances the robustness and performance of merged models by incorporating activation-space insights. AIM is designed as a complementary framework that can be applied to any merging approach, preserving critical weights from the base model through principles drawn from continual learning and model compression. By utilizing a task-agnostic calibration set, AIM selectively prioritizes essential parameters, leading to significant performance improvements across multiple benchmarks, with up to a 40% increase in effectiveness.   TBD

Part of

ML Tea

ML Tea: Activation-Informed Merging of LLMs

April 07 2025

Location

Part of

May 05

ML Tea: Algorithm Design with Learned Predictions

April 28

ML Tea: Evaluating Multiple Models Using Labeled and Unlabeled Data

ML Tea: Activation-Informed Merging of LLMs

April 07 2025

Location

Part of

Related Events

May 05

ML Tea: Algorithm Design with Learned Predictions

April 28

ML Tea: Evaluating Multiple Models Using Labeled and Unlabeled Data