Hardening Ask Q: making a live Vernal webchat integration survive all-day use

Ask Q was already working. That is the part people forget when they read a hardening post and imagine a launch story.

The widget loaded. The bridge talked to Broca. Answers came back. In a fifteen-minute demo, everything looked fine. Then an operator did what operators do: they opened the Ask Q panel in Sanctum Tasks admin, left it pinned beside a long task thread, and went back to work. By mid-afternoon the panel had gone quiet — not with an error banner, not with a polite “try again,” just the particular silence of a poll loop that has run out of budget.

This post is about that silence. It is about the math behind it, the layers we had to peel apart to fix it, and the defaults we changed so the next operator does not have to be a rate-limit detective. Ask Q is integration hardening, not a product reveal. The Vernal stack on moya was fine. The PHP bridge on multihost was mostly fine. What was not fine was the assumption that a three-second poll interval and a three-hundred-responses-per-hour cap could coexist in the same afternoon.

The integration map: widget, bridge, Broca, agent

Before the numbers, the topology — because “the chat broke” is never one box.

Sanctum Tasks admin embeds the Ask Q widget: a poll-based webchat surface that asks the PHP Q bridge (public/q-bridge/) for conversation state. The bridge is the policy layer — rate limits, session keys, attachment rules — sitting on multihost beside the rest of Tasks.

The bridge forwards operator messages to Broca on moya (inbox/outbox HTTP), which serializes traffic with Telegram and the rest of the Sanctum agent fleet. Broca hands work to Letta on the agent side. When everything is aligned, the operator sees streaming-ish updates as polls return fresh outbox rows.

Short sessions hide coupling. Long sessions expose it. The widget does not know about Broca’s inbox semantics; the bridge does not know the operator left the tab focused for six hours; Broca does not know the widget polls every three seconds unless you tell it in the limit tables. Hardening Ask Q meant aligning all three budgets — widget poll cadence, PHP rate buckets, Broca poll allowances — so “all day” is a design input, not an accident.

The fifteen-minute math

The failure mode that triggered this sprint was almost embarrassingly arithmetic.

The widget polls the responses endpoint on a three-second interval to refresh assistant output. That is not aggressive for a chat UI; it is conventional. But three seconds per poll is twenty polls per minute, twelve hundred polls per hour.

The default cap on /api/responses before hardening was three hundred per hour.

You do not need a spreadsheet to see the cliff. At three hundred per hour, a steady three-second poll burns the budget in fifteen minutes. After that, the API correctly returns HTTP 429. The widget, before fixes, did not always surface 429 as a recoverable state — so the UI felt “stuck.” The agent might still have been thinking on the far side of Broca; the operator only saw a panel that stopped updating.

That mismatch — 1200/h demand vs 300/h supply — is documented in the Phase 3 composer PRD as throughput root cause, and it is the kind of bug that looks like a flaky network until you read the access logs. We did not “discover Ask Q”; we discovered that demo-length integrations ship with demo-length limits.

Layer one: stop punishing Broca inbox/outbox polls

While we were chasing 429s on responses, a second class of failures showed up on send: messages that never left the widget, or replies that never arrived, with symptoms that pointed at “Broca is down” when Broca was healthy.

The bridge had been rate-limiting Broca inbox/outbox poll paths the same way it rate-limited user-facing chat traffic. That was incorrect policy. Inbox/outbox polling is how the bridge learns that Athena (or whichever agent backs Q Vernal) has answered; throttling it produces send stalls and false negatives unrelated to the operator’s chat budget.

Commit e50628e (2026-06-15) removed that coupling: Broca poll routes are no longer subject to the same limits as casual API browsing. Separating “operator chat surface” from “agent transport housekeeping” is boring infrastructure work, and it is the difference between “Q is ignoring me” and “the bridge stopped checking the outbox.”

Layer two: user_session limits and navigation

A parallel failure mode was user_session throttling. An operator who navigates across Tasks admin — opening settings, returning to a task, reloading Ask Q — can legitimately create more session traffic than a single static poll loop.

Before 26b1b43 (2026-06-16), aggressive session limits could block sends even when response polls still had headroom, which is a particularly cruel failure: the panel looks alive, but your message will not ship. Fixing that required treating session churn as expected admin behavior, not abuse.

Layer three: widget behavior when responses hit 429

Raising caps is necessary but not sufficient. Operators will still brush against limits during incidents, misconfigured loops, or future features.

Commit b45ec35 (2026-06-18) addressed widget stalling when the response poll receives 429. The desired behavior is explicit backoff and recovery: the UI should tell the truth (“rate limited, retrying”) instead of mimicking a dead agent. Silent failure is the enemy of all-day tools; a visible retry is a trustworthy one.

New defaults and admin-tunable caps

Defaults matter because most installs never touch the runbook. On 2026-06-18 (7364f8f) we raised the baseline buckets in public/q-bridge/includes/rate_limit_config.php to match all-day poll math:

  • Responses: 7200/hour — roughly six× headroom over a continuous 3s poll (1200/h), leaving room for bursty UI refreshes and parallel tabs.
  • user_session: 3000/hour — navigation-heavy admin sessions without tripping send blocks.
  • User overall: 20000/hour — aggregate safety valve set generously for real operators, not lab bots.
  • Broca inbox (per server IP): 10000/hour — transport polling decoupled from chat caps but still bounded.

These numbers mirror the recommendations in docs/Q-BRIDGE-OPS-RUNBOOK.md §5. They are not magic; they are budgets you can defend in prose. If product changes the poll interval tomorrow, you re-multiply. The historical Tasks #678 thread is worth remembering here: we once approved a 300/h responses cap when the integration was younger and poll cadence had not been reconciled against production admin habits. Hardening is the act of revisiting those approvals when real usage proves the math wrong — not a one-time launch checklist.

We also exposed live tuning in Settings → Ask Q (public/admin/_settings/ask_q.php). Admins can adjust limits without a deploy — important when a single operator runs three pinned panels during a launch weekend, or when you need to temporarily tighten during an incident. The panel exposes the same buckets the bridge enforces: responses, session, aggregate user traffic, and Broca inbox allowances. Hardening is not only code; it is operability you can see in the UI instead of only in PHP files.

Adjacent reliability work (short)

Not every June commit was about rate limits, but two adjacent fixes matter to the same audience:

  • Document body cap raised from 100k to 1M (b7710b0, 2026-06-12) so large shared docs do not truncate mid-bridge — reliability for the content Ask Q reasons over.
  • Guest-visible shared documents regained working inline images (186bb3d, 2026-06-14) — a polish fix that prevents “the doc looks broken” from masquerading as “Q is broken.”
  • MP4 task attachments (1f855c4, 2026-06-19) extend Tasks media handling; mention it only as context that the admin surface Ask Q lives beside continues to mature.

Phase 3 composer: what is next (not shipped)

Large paste is the next frontier. Operators do not only ask short questions; they dump stack traces, log excerpts, and multi-page briefs. The current composer UX and throughput model strain when paste size spikes — the Phase 3 PRD (docs/ASK-Q-PHASE-3-COMPOSER-UPGRADE-PRD.md, filed c3d3871) describes attachment chips, composer layout, and throughput goals.

That work is pending. Prototype tooling lives under tools/ask-q-composer-prototype/; it is not production Ask Q yet. The hardening described here is what makes today’s widget trustworthy; Phase 3 is what makes tomorrow’s heavy paste feel native rather than heroic.

Lessons for integrators building poll-based widgets

If you take one checklist away, make it this:

  1. Multiply poll interval before you ship. Seconds-per-poll × polls-per-hour must fit your default API cap with headroom for tabs, retries, and admin navigation.
  2. Separate transport polls from user polls. Inbox/outbox housekeeping is not the same product surface as chat; rate-limiting them together creates ghosts.
  3. Never swallow 429. Backoff, surface state, recover — silent stalls train operators to blame the agent.
  4. Expose tunables in admin, not only in PHP defaults — incidents happen on weekends.
  5. Call “hardening” what it is. Integrations do not fail at launch; they fail at hour three, when the demo math runs out.

Ask Q did not need a new agent. It needed honest budgets across the widget, the bridge, and Broca — plus the humility to treat “open all day” as the normal case. That is the bar we are holding going into Phase 3.