Representation as a Bottleneck for Mechanistic Interpretability: The Manifestation Unit Protocol
Researchers introduce Manifestation Units to standardize and improve the reusability of mechanistic interpretability analyses.
Current interpretability outputs like circuit diagrams and feature lists are often siloed in individual notebooks. The Manifestation Unit protocol proposes a typed tuple structure to act as a standardized representation layer, making these analyses composable, queryable, and actionable for downstream interventions.