The two signal tasks × 5 attempts each (10 trials) through the reasoning-cap proxy (port 8021, cap ~4000), qwen3.6-35b-a3b, thinking ON.
Mean 0.80: regex-log 5/5, openssl-selfsigned-cert 3/5.
More samples for tighter confidence (K=15), and try the dense 27B model through the same proxy.
llama-local/qwen3.6-35b-a3bagentharnesses.minimal_pi:MinimalPithinkingonreasoning budgetproxy · cap 4000 — see journal for the authoritative mechanismmaxTokens / contextWindow32768 / 131072agent timeout ×2.0trials10 of 10 — 8 pass · 2 failmean reward0.80tokens (job total)4,088,115 in / 164,705 outstarted / finished2026-07-02T18:16 / 2026-07-02T18:52wall clock36m12s| # | result | total | agent | in/out tok |
|---|---|---|---|---|
| 1 | FAIL | 1m10s | 21s | 39851/2843 |
| 2 | PASS | 1m05s | 17s | 50316/2808 |
| 3 | PASS | 1m00s | 12s | 25763/1827 |
| 4 | FAIL | 1m11s | 21s | 52497/2821 |
| 5 | PASS | 59s | 12s | 21653/1687 |
| # | result | total | agent | in/out tok |
|---|---|---|---|---|
| 1 | PASS | 7m57s | 6m58s | 889355/32115 |
| 2 | PASS | 7m33s | 6m36s | 739562/49511 |
| 3 | PASS | 4m36s | 3m36s | 491670/21487 |
| 4 | PASS | 6m20s | 5m24s | 1589957/37832 |
| 5 | PASS | 4m17s | 3m21s | 187491/11774 |