← older: smoke__qwen3.6-35b-a3b__20260702-180115all runsnewer: smoke__qwen3.6-27b__20260702-190304

smoke__qwen3.6-35b-a3b__20260702-181612

Proxy cap, K=5 stats — regex-log 5/5; openssl's misses are not budget-related (0.80)

What was run

The two signal tasks × 5 attempts each (10 trials) through the reasoning-cap proxy (port 8021, cap ~4000), qwen3.6-35b-a3b, thinking ON.

Results

Mean 0.80: regex-log 5/5, openssl-selfsigned-cert 3/5.

Interpretation

Next steps

More samples for tighter confidence (K=15), and try the dense 27B model through the same proxy.

Run details

modelllama-local/qwen3.6-35b-a3bagentharnesses.minimal_pi:MinimalPithinkingonreasoning budgetproxy · cap 4000 — see journal for the authoritative mechanismmaxTokens / contextWindow32768 / 131072agent timeout ×2.0trials10 of 10 — 8 pass · 2 failmean reward0.80tokens (job total)4,088,115 in / 164,705 outstarted / finished2026-07-02T18:16 / 2026-07-02T18:52wall clock36m12s

Tasks

openssl-selfsigned-cert — 3/5 passed

#resulttotalagentin/out tok
1FAIL1m10s21s39851/2843
2PASS1m05s17s50316/2808
3PASS1m00s12s25763/1827
4FAIL1m11s21s52497/2821
5PASS59s12s21653/1687

regex-log — 5/5 passed

#resulttotalagentin/out tok
1PASS7m57s6m58s889355/32115
2PASS7m33s6m36s739562/49511
3PASS4m36s3m36s491670/21487
4PASS6m20s5m24s1589957/37832
5PASS4m17s3m21s187491/11774