Position: RL Researchers Need to Distinguish Between Solving Simulators and Using Simulators as a Proxy
Researchers argue that RL benchmarks are often treated as the goal rather than a proxy for real-world deployment.
The paper highlights a critical distinction in RL research: solving a simulator versus using it to learn general-purpose decision-making. When researchers optimize exclusively for simulator performance, they often adopt techniques that fail in real-world deployment. The authors suggest that while simulator-specific solutions are valuable, they should be clearly categorized to avoid conflation with generalizable RL.