GPT-5.2-high LMArena scores released, OpenAI falls from #6 to #13
Posted by reed1234 5 hours ago
Comments
Comment by _nub3 3 hours ago
Cookie Banner conflicts with cloudflare anti bot stuff
Site is unusable.
Comment by reed1234 5 hours ago
While GPT-5.2 scores well on benchmarks, human preference is important for OpenAI’s consumer focused products.
Comment by aeonfox 3 hours ago
Arena Overview section is heavily biased towards languages. grok-4.1-thinking is worse than claude-opus-4-5-20251101-thinking-32k on every non-language metric by a large margin but somehow ranks higher overall, maybe because opus is way worse Spanish and Korean?