New math benchmark reveals AI models confidently solve problems that have no solution
New math benchmark shows AI struggles with unsolvable problems
SOOHAK, a benchmark with 439 math tasks (including 99 unsolvable ones), reveals AI models solve 30% of research-level problems but fail to detect unsolvable tasks. Larger models improve accuracy but not their ability to recognize invalid problems. The gap highlights limitations in mathematical reasoning and self-awareness.