Epidemiology of Model Collapse: Modeling Synthetic Data Contamination via Bilayer SIR Dynamics
Researchers propose a bilayer SIRS model to analyze how synthetic data cross-contamination between AI models leads to systemic model collapse.
Current research treats model collapse as a single-chain degradation issue. This paper introduces a bilayer SIRS framework that models data corpora and AI models as interacting populations. By tracking how synthetic data flows between models and shared datasets, the authors provide a more accurate epidemiological view of how cross-contamination accelerates performance degradation.