← older: smoke__qwen3.6-27b__20260702-190304all runsnewer: smoke__qwen3.6-35b-a3b__20260702-213821

smoke__qwen3.6-35b-a3b__20260702-191746

Proxy cap at K=15 — the evidence run: regex-log 93%, openssl is model baseline (0.77)

What was run

The statistical run: the two signal tasks × 15 attempts each (30 trials) through the reasoning-cap proxy (port 8021, cap ~4000), qwen3.6-35b-a3b, thinking ON.

Results

Mean 0.77: regex-log 14/15 (93%), openssl-selfsigned-cert 9/15 (60%).

Interpretation

Next steps

The proxy is an extra moving part. llama.cpp has a native --reasoning-budget N server flag that does the same job — switch the llama-swap config to it and verify parity, then retire the proxy.

Run details

modelllama-local/qwen3.6-35b-a3bagentharnesses.minimal_pi:MinimalPithinkingonreasoning budgetproxy · cap 4000 — see journal for the authoritative mechanismmaxTokens / contextWindow32768 / 131072agent timeout ×2.0trials30 of 30 — 23 pass · 7 failmean reward0.77tokens (job total)8,439,802 in / 337,955 outstarted / finished2026-07-02T19:17 / 2026-07-02T20:40wall clock82m35s

Tasks

openssl-selfsigned-cert — 9/15 passed

#resulttotalagentin/out tok
1FAIL1m26s24s103666/3493
2FAIL1m04s15s41642/2081
3PASS1m12s21s61843/3478
4FAIL1m16s25s59036/3405
5PASS1m09s20s45721/3215
6PASS1m06s16s31541/2627
7PASS1m05s15s41577/2432
8PASS1m04s13s26449/2151
9FAIL1m10s18s51563/2692
10PASS1m03s14s34558/2367
11PASS1m00s10s18628/1725
12FAIL1m10s19s52696/2799
13PASS1m10s20s46844/3205
14PASS1m13s23s77012/3676
15FAIL1m03s12s25687/1727

regex-log — 14/15 passed

#resulttotalagentin/out tok
1PASS3m44s2m39s245551/13559
2PASS3m29s2m31s434569/18856
3PASS6m29s5m32s1128803/36436
4PASS3m57s2m58s282053/16089
5FAIL5m52s4m55s926857/33742
6PASS3m06s2m08s128080/9398
7PASS2m53s1m53s185126/11999
8PASS5m05s4m08s456024/18137
9PASS2m56s1m57s168521/5712
10PASS3m50s2m53s271701/16024
11PASS3m39s2m38s527233/19523
12PASS4m26s3m26s397215/20459
13PASS3m31s2m31s246277/18877
14PASS3m55s2m53s286243/17394
15PASS8m19s7m21s2037086/40677