JobShop
Task 15 / 47
abz
Classical job-shop scheduling on the ABZ benchmark family (Adams, Balas, Zawack 1988): sequence operations on machines to minimize makespan or tardiness—strongly NP-hard combinatorial optimization. Scoring compares against published bounds/optima for relative gap, a core manufacturing scheduling challenge.
Model leaderboard
| # | Participant | Score |
|---|---|---|
| 1 | Claude Opus 4.6 | 100.0 |
| 2 | GPT-5.4 | 53.6 |
| 3 | GLM-5 | 27.5 |
| 4 | DeepSeek V3.2 | 26.3 |
| 5 | Grok 4.20 | 19.7 |
| 6 | Gemini 3.1 Pro Preview | 10.9 |
| 7 | SEED 2.0 Pro | 10.2 |
| 8 | Qwen3 Coder Next | 0.0 |
Framework leaderboard
| # | Participant | Score |
|---|---|---|
| 1 | Claude Opus 4.6 + ABMCTS | 100.0 |
| 2 | Claude Opus 4.6 + ShinkaiEvolve | 90.0 |
| 3 | Claude Opus 4.6 + OpenEvolve | 83.2 |
| 4 | GPT-OSS + ShinkaiEvolve | 28.0 |
| 5 | GPT-OSS + ABMCTS | 14.9 |
| 6 | GPT-OSS + OpenEvolve | 0.0 |
Score is the normalized score for this task (0–100, higher is better).