JobShop Task 15 / 47

abz

Classical job-shop scheduling on the ABZ benchmark family (Adams, Balas, Zawack 1988): sequence operations on machines to minimize makespan or tardiness—strongly NP-hard combinatorial optimization. Scoring compares against published bounds/optima for relative gap, a core manufacturing scheduling challenge.

Model leaderboard

#	Participant	Score
1	Claude Opus 4.6	100.0
2	GPT-5.4	53.6
3	GLM-5	27.5
4	DeepSeek V3.2	26.3
5	Grok 4.20	19.7
6	Gemini 3.1 Pro Preview	10.9
7	SEED 2.0 Pro	10.2
8	Qwen3 Coder Next	0.0

Framework leaderboard

#	Participant	Score
1	Claude Opus 4.6 + ABMCTS	100.0
2	Claude Opus 4.6 + ShinkaiEvolve	90.0
3	Claude Opus 4.6 + OpenEvolve	83.2
4	GPT-OSS + ShinkaiEvolve	28.0
5	GPT-OSS + ABMCTS	14.9
6	GPT-OSS + OpenEvolve	0.0

Score is the normalized score for this task (0–100, higher is better).