← older: smoke__qwen3.6-35b-a3b__20260702-181612all runsnewer: smoke__qwen3.6-35b-a3b__20260702-191746

smoke__qwen3.6-27b__20260702-190304

Dense 27B sanity check through the proxy — 4/4

What was run

The dense sibling model qwen3.6-27b (not the MoE) on the 4-task smoke set, 1 attempt each, through the reasoning-cap proxy (port 8021, cap ~4000), thinking ON.

Results

4/4.

Interpretation

The harness + proxy mechanism is model-agnostic across the two Qwen variants, and the dense 27B handles the smoke set fine. The dense model decodes slower than the MoE (~3B active) though, so the MoE remains the primary benchmark target for full-suite runs on this single 3090.

Next steps

Back to the MoE for a bigger-K statistical run of the two signal tasks.

Run details

modelllama-local/qwen3.6-27bagentharnesses.minimal_pi:MinimalPithinkingonreasoning budgetproxy · cap 4000 — see journal for the authoritative mechanismmaxTokens / contextWindow32768 / 131072agent timeout ×2.0trials4 of 4 — 4 pass · 0 failmean reward1.00tokens (job total)399,991 in / 22,645 outstarted / finished2026-07-02T19:03 / 2026-07-02T19:16wall clock13m06s

Tasks

fix-git — 1/1 passed

#resulttotalagentin/out tok
1PASS1m23s35s31489/1589

nginx-request-logging — 1/1 passed

#resulttotalagentin/out tok
1PASS1m38s47s44147/2390

openssl-selfsigned-cert — 1/1 passed

#resulttotalagentin/out tok
1PASS1m31s41s30601/2241

regex-log — 1/1 passed

#resulttotalagentin/out tok
1PASS8m33s7m31s293754/16425