Navers lab
← All tasks
EngDesign Task 9 / 47

EngDesign

A bundle of EngDesign-sourced engineering cases (CY_03, WJ_01, XY_05, AM_02, AM_03, YJ_02, YJ_03) evaluated through Docker-backed scripts and per-case rubrics spanning structural, topology, and related design tasks. In the v1 pool it counts as one benchmark family though multiple distinct problem providers exist; use `task=engdesign` per the domain README.

Model leaderboard

# Participant Score
1 Grok 4.20 100.0
2 Gemini 3.1 Pro Preview 100.0
3 SEED 2.0 Pro 100.0
4 GLM-5 94.4
5 Qwen3 Coder Next 94.4
6 DeepSeek V3.2 79.4
7 GPT-5.4 0.0
8 Claude Opus 4.6 0.0

Framework leaderboard

# Participant Score
1 Claude Opus 4.6 + OpenEvolve 100.0
2 GPT-OSS + OpenEvolve 53.2
3 GPT-OSS + ShinkaiEvolve 7.1
4 Claude Opus 4.6 + ABMCTS 7.1
5 Claude Opus 4.6 + ShinkaiEvolve 0.4
6 GPT-OSS + ABMCTS 0.0

Score is the normalized score for this task (0–100, higher is better).