For Evaluation on AIME2025, GPQA, LSAT and MMLU, you can use scripts in scripts/eval: python scripts/eval/generate_aime.py python scripts/eval/generate_gpqa.py python ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results