Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution
Orthrus-Qwen3 accelerates Qwen3 inference 7.8× without output loss
Orthrus-Qwen3 uses dual-view diffusion decoding to achieve 7.8× faster token generation than Qwen3 while maintaining identical output distribution. This method reduces computational overhead without sacrificing accuracy, making it suitable for real-time applications.