EngDesign Task 9 / 47

EngDesign

A bundle of EngDesign-sourced engineering cases (CY_03, WJ_01, XY_05, AM_02, AM_03, YJ_02, YJ_03) evaluated through Docker-backed scripts and per-case rubrics spanning structural, topology, and related design tasks. In the v1 pool it counts as one benchmark family though multiple distinct problem providers exist; use `task=engdesign` per the domain README.

Model leaderboard

#	Participant	Score
1	Grok 4.20	100.0
2	Gemini 3.1 Pro Preview	100.0
3	SEED 2.0 Pro	100.0
4	GLM-5	94.4
5	Qwen3 Coder Next	94.4
6	DeepSeek V3.2	79.4
7	GPT-5.4	0.0
8	Claude Opus 4.6	0.0

Framework leaderboard

#	Participant	Score
1	Claude Opus 4.6 + OpenEvolve	100.0
2	GPT-OSS + OpenEvolve	53.2
3	GPT-OSS + ShinkaiEvolve	7.1
4	Claude Opus 4.6 + ABMCTS	7.1
5	Claude Opus 4.6 + ShinkaiEvolve	0.4
6	GPT-OSS + ABMCTS	0.0

Score is the normalized score for this task (0–100, higher is better).