Astrodynamics
Task 2 / 47
MannedLunarLanding
This benchmark targets soft-landing trajectory optimization for a crewed lunar lander under thrust limits, propellant use, and dynamical/path constraints. The goal is a feasible trajectory from orbit to terminal conditions that lands safely while saving fuel where possible. Evaluation stresses nonlinear optimal control, constraint satisfaction, and terminal accuracy—typical of real astrodynamics optimization.
Model leaderboard
| # | Participant | Score |
|---|---|---|
| 1 | GLM-5 | 100.0 |
| 2 | GPT-5.4 | 92.1 |
| 3 | DeepSeek V3.2 | 66.4 |
| 4 | Claude Opus 4.6 | 64.1 |
| 5 | SEED 2.0 Pro | 6.9 |
| 6 | Gemini 3.1 Pro Preview | 4.3 |
| 7 | Grok 4.20 | 0.0 |
| 8 | Qwen3 Coder Next | 0.0 |
Framework leaderboard
| # | Participant | Score |
|---|---|---|
| 1 | GPT-OSS + ShinkaiEvolve | 100.0 |
| 2 | Claude Opus 4.6 + ABMCTS | 61.0 |
| 3 | Claude Opus 4.6 + OpenEvolve | 53.8 |
| 4 | GPT-OSS + OpenEvolve | 37.6 |
| 5 | Claude Opus 4.6 + ShinkaiEvolve | 34.7 |
| 6 | GPT-OSS + ABMCTS | 0.0 |
Score is the normalized score for this task (0–100, higher is better).