DynoSim: Simulating the Pareto Frontier
DynoSim provides a discrete-event simulation environment to model the NVIDIA Dynamo serving stack and optimize complex LLM deployment configurations.
Tuning LLM serving stacks involves interdependent variables like tensor-parallel shapes, scheduler settings, and KV cache behavior. DynoSim allows engineers to simulate these interactions without the high cost of full-scale GPU cluster experiments, helping identify bottlenecks before physical deployment.