Latent Cache Flow: Model-to-Model Communication Without Text
Latent Cache Flow proposes a method for model-to-model communication by exchanging KV caches directly, bypassing text-based decoding and encoding.
Current LLM agent communication relies on text, causing latency and information loss. Latent Cache Flow aims to optimize this by transferring KV matrices between models using learned adapters, potentially reducing the overhead associated with autoregressive generation.