LMGame-Bench: How Good are LLMs at Playing Games?
Published in ICLR 2026, 2026
Introduces LMGame-Bench, a unified Gym‑style benchmark that tests LLM agents across platformer, puzzle, and narrative games—addressing vision brittleness, prompt variance, and data contamination.
