SAT: Sequential Agent Tuning for Coordinator Free Plug and Play Multi-LLM Training with Monotonic Improvement Guarantees
SAT enables stable training of multiple small LLMs without a coordinator
SAT addresses training instability in multi-LLM systems by sequentially updating agents without a central coordinator. This method ensures monotonic performance improvements while avoiding compounding distribution shifts during collaborative training.