First-ever smoke run of the custom minimal-pi harness against the local
llama-swap server: qwen3.6-35b-a3b (MoE, 3B active), 4 quick tasks × 1
attempt. Thinking was uncontrolled — this predates any reasoning-budget
mechanism and even the thinking kwarg (config shows thinking=(none)).
4/4 passed. Harness plumbing (in-container pi → Docker gateway → llama-swap
on port 8020, custom provider extension with raised maxTokens) all worked.
A clean sheet, but misleading: regex-log happened to terminate its reasoning
this time. The very next run showed that with uncontrolled thinking the model
can loop in <think> and burn the whole output budget — this pass was partly
luck, not a solved problem.
Re-run the same smoke set to check stability of the result.
llama-local/qwen3.6-35b-a3bagentharnesses.minimal_pi:MinimalPithinking(none)reasoning budgetdirect (server/none)* — see journal for the authoritative mechanismmaxTokens / contextWindow32768 / 131072agent timeout ×2.0trials4 of 4 — 4 pass · 0 failmean reward1.00tokens (job total)516,270 in / 46,042 outstarted / finished2026-07-02T16:36 / 2026-07-02T16:44wall clock8m49s| # | result | total | agent | in/out tok |
|---|---|---|---|---|
| 1 | PASS | 1m27s | 23s | 26570/1103 |
| # | result | total | agent | in/out tok |
|---|---|---|---|---|
| 1 | PASS | 1m09s | 14s | 40905/2044 |
| # | result | total | agent | in/out tok |
|---|---|---|---|---|
| 1 | PASS | 1m09s | 19s | 36334/3139 |
| # | result | total | agent | in/out tok |
|---|---|---|---|---|
| 1 | PASS | 5m02s | 4m02s | 412461/39756 |