Half of a thinking on/off comparison: same 4 smoke tasks, qwen3.6-35b-a3b, with
--thinking off (the harness now sends chat_template_kwargs.enable_thinking=false
via pi's qwen-chat-template compat — this run validated that control path,
which had an extension bug earlier).
3/4. regex-log — the task that loops with thinking on — passes with
thinking off. But openssl-selfsigned-cert fails.
Together with the cmpthink-on sibling run this shows a clean trade-off:
thinking OFF prevents the runaway loop but costs quality on tasks that benefit
from deliberation; thinking ON helps those tasks but risks the loop. Neither
binary setting is right — we want thinking ON but length-bounded.
Run the cmpthink-on sibling to complete the A/B, then build a reasoning cap.
llama-local/qwen3.6-35b-a3bagentharnesses.minimal_pi:MinimalPithinkingoffreasoning budgetdirect (server/none)* — see journal for the authoritative mechanismmaxTokens / contextWindow32768 / 131072agent timeout ×2.0trials4 of 4 — 3 pass · 1 failmean reward0.75tokens (job total)271,546 in / 14,916 outstarted / finished2026-07-02T17:17 / 2026-07-02T17:22wall clock5m06s| # | result | total | agent | in/out tok |
|---|---|---|---|---|
| 1 | PASS | 59s | 7s | 33741/807 |
| # | result | total | agent | in/out tok |
|---|---|---|---|---|
| 1 | PASS | 1m01s | 10s | 46907/1254 |
| # | result | total | agent | in/out tok |
|---|---|---|---|---|
| 1 | FAIL | 1m00s | 10s | 29155/1239 |
| # | result | total | agent | in/out tok |
|---|---|---|---|---|
| 1 | PASS | 2m05s | 1m09s | 161743/11616 |