how does deepseek r1's performance in math-heavy benchmarks compare to gpt-4o

line的英语怎么说