Kernel Engineering
Task 20 / 47
TriMul
This benchmark asks for a high-performance TriMul-style GPU kernel under strict correctness, trading off tiling, layout, and occupancy—often VRAM-bound on consumer GPUs. Evaluation runs representative workloads and scores both accuracy and speed against the benchmark's reference, highlighting specialized GEMM-like kernel engineering.
Model leaderboard
| # | Participant | Score |
|---|---|---|
| 1 | Claude Opus 4.6 | 100.0 |
| 2 | Grok 4.20 | 37.9 |
| 3 | GLM-5 | 20.4 |
| 4 | DeepSeek V3.2 | 12.2 |
| 5 | SEED 2.0 Pro | 12.0 |
| 6 | Gemini 3.1 Pro Preview | 2.2 |
| 7 | Qwen3 Coder Next | 0.4 |
| 8 | GPT-5.4 | 0.0 |
Framework leaderboard
| # | Participant | Score |
|---|---|---|
| 1 | Claude Opus 4.6 + ShinkaiEvolve | 100.0 |
| 2 | Claude Opus 4.6 + ABMCTS | 10.0 |
| 3 | GPT-OSS + ABMCTS | 7.1 |
| 4 | GPT-OSS + OpenEvolve | 4.3 |
| 5 | Claude Opus 4.6 + OpenEvolve | 2.3 |
| 6 | GPT-OSS + ShinkaiEvolve | 0.0 |
Score is the normalized score for this task (0–100, higher is better).