ADR-0036: Cross-Harness Producer Ingress — Bus-Hosted GitHub App
Accepted Town-crier Cross-Harness ContractDate: 2026-06-26 Tracks: kendo TC-0014 (script instance, project 13) Reasoning: campaigns/town-crier/2026-06-26-github-app-announce-producer-design.md (build-ready spec, parity target, risk analysis)
Status note. Jasper cleared the contract change and signed off on trust (TC-0014 comments 2548 + 2550, 2026-06-26: B1 APPROVED, Q1 = this ADR, Q4 = pull-first, Q2 + Q3 confirmed — see below) and assigned the build to Gerard. He declined a separate ADR-text review, trusting the design pass, so the ADR is
Accepted(decision adopted) rather than awaiting text sign-off; the first build PR is where the code starts. The co-owned-territory courtesy (town-crierCLAUDE.md) is satisfied: the co-owner granted the contract and remains the CODEOWNERS merge-gate reviewer (no self-merge — a Gerard-authored build PR is gated on him regardless).
Context
The town-crier review bus learns about a PR through a producer — today a per-repo .github/workflows/announce-pr.yml carrying three jobs:
- announce on
labeled/synchronize(filtered to theAgent Review Requestedlabel), - resolve on
closed/unlabeled, - consensus on
pull_request_review(counts current-head approvals; auto-resolves at ≥2).
This workflow is live in 14 producer repos (the rollout went fleet-wide). Every protocol tweak is a PR into N repos, and each repo needs hand-provisioning: a TOWN_CRIER_URL var + a TOWN_CRIER_TOKEN secret (repo admin) + the workflow file itself (a workflow-scope token). Config is repo-level with no org inheritance.
Two structural costs follow:
- Recurring fleet churn. N-repo PRs + per-repo secret provisioning for every change to a harness-agnostic coordination concern.
- A shared bearer token replicated into 14 repos' secrets. The
github-actionproducer identity is carried asTOWN_CRIER_TOKENin every producer repo. That is 14 copies of one credential.
And it structurally blocks repos where we lack admin — wijs is off-bus today precisely because we cannot set its secret or merge its workflow.
The Laravel backend port is merged (epic/laravel-backend #48 + W6 OAuth #47 + W7b #49); the TS src/ is gone. The bus is now a full Laravel Action/DTO/FormRequest app with AnnounceRequestAction / ResolveRequestAction and a Passport-mcp:use identity model (App\Auth\ProducerIdentity, default github-action). Kendo already runs the exact GitHub-App pattern this ADR adopts (it moves issue cards on branch-create / PR-merge) — a verified, near-verbatim blueprint, minus kendo's entire tenancy layer.
This is a cross-harness coordination contract, not a town-crier-internal refactor: the producer ingress is the surface every independently-run harness (war-room general, Jasper's, the-laboratory mad-scientist, krypt0nbull3t) depends on to learn about PRs. Changing where and how PRs are announced changes the contract all harnesses inherit — which is why it earns its own ADR (Jasper's Q1 call).
Decision
Replace the 14 per-repo announce-pr.yml workflows with one bus-hosted GitHub App ("town-crier announce") whose webhook the bus serves. Install once per org → every repo's PR events stream to one endpoint → the bus announces / resolves / consensus-resolves server-side, in-process.
1. Webhook ingress inside the merged Laravel bus (not a new service)
A new HTTP ingress POST /github/webhook, HMAC-verified (X-Hub-Signature-256), unauthenticated by Passport — it is GitHub calling, not a harness. This is a second auth class on the bus alongside Passport mcp:use: harness traffic carries an OAuth bearer; producer traffic carries a GitHub webhook signature. After verifying the signature, the handler dispatches in-process to the existing AnnounceRequestAction / ResolveRequestAction — it does not call the bus's own REST API.
- Middleware
VerifyGithubWebhook:X-Hub-Signature-256HMAC via#[Config('github.webhook_secret')],hash_equals, fail-closed on empty secret (copy kendo's verbatim). - Controller
GithubWebhookController: fan out byX-GitHub-Event.installationruns inline;pull_request/pull_request_reviewdispatch queued jobs. No tenancy — drop kendo'sinitializeTenantFromPayload/TenantSwitcherentirely.
2. The github-action identity is minted internally — no bus token in any repo
The handler stamps its server-side announce/resolve with the github-action ProducerIdentity (config('town-crier.producer_identity')) — the same predicate the REST resolveByUrl path uses. The App authenticates events by its own webhook HMAC; it does not need, and does not carry, the shared bus bearer token. TOWN_CRIER_TOKEN leaves all 14 repos. This is a net attack-surface reduction: one HMAC secret on the bus replaces 14 copies of a bearer token.
3. Two event subscriptions absorb all three workflow jobs — behaviorally identical
pull_request (actions {opened, reopened, ready_for_review, labeled, synchronize, unlabeled, closed}):
- Announce iff (
action == labeledANDlabel.name == "Agent Review Requested") OR (action ∈ {opened, reopened, ready_for_review, synchronize}AND the PR's label set contains it).AnnounceRequestDatafrom the payload (pr_url = html_url,repo,title,requester = user.login,head_oid = head.sha). Idempotent onpr_url; same-head re-announce is a no-op; a new head reopens. This is the workflow'sannouncejobif, preserved byte-for-byte. - Resolve iff (
action == closedAND label present) OR (action == unlabeledAND the removed label is"Agent Review Requested"). Note ="merged"/"closed without merge"/"review label removed".
pull_request_review (review.state == approved AND PR carries the label): fetch the PR's full review list via the App installation token (outbound GET — §4) and apply the identical tally — latest opinionated review per reviewer; count an approval only if commit_id == current head; count CHANGES_REQUESTED without head-filtering. Consensus = ≥2 distinct current-head approvals AND zero outstanding objections → ResolveRequestAction, note "consensus: N approvals (who) @<head>".
The asymmetric head-filtering (approvals head-gated, objections not) is a deliberate false-negative bias — a standing objection blocks consensus even past a new head. Port the rule, not a paraphrase of it. This is the load-bearing parity surface.
4. First outbound HTTP from the bus — explicit timeouts + rotation-invariant token cache
Town-crier has had zero outbound HTTP by design. The App introduces two GitHub call sites, both with explicit ->timeout() (Principle #8 — direct calls, the library-author note does not apply):
- Mint installation token — App-JWT (RS256, signed with
app_private_key) →POST /app/installations/{id}/access_tokens. ~10s timeout. Token TTL ~1h; cache keyed byinstallation_id(stable surrogate — Principle #10 rotation-invariance), never by the token value. - Read PR reviews —
GET /repos/{repo}/pulls/{n}/reviews(paginated), installation token. ~10s timeout (mirrors the workflow'scurl --max-time 10).
Both wrapped so a GitHub outage degrades to "consensus not auto-resolved this round" (the thread stays open — the conservative direction), never a thrown 500 back to GitHub's webhook delivery.
5. Fail-open, but observability is now an acceptance criterion (the silent-failure inversion)
A transient GitHub-read failure or bus hiccup must never strand the fleet loudly — log a warning, return 200 (so GitHub does not retry a parse bug forever; mirrors kendo's 200-on-malformed). But the failure mode inverts: today a broken workflow reds one repo's check and someone notices. A broken App ingress (HMAC drift, route dropped on deploy, install revoked) drops a whole org off the bus with no red check anywhere — silent, fleet-wide.
Per the Commander directive (2026-06-26), detection is a requirement, not an open question. It centers on GitHub ground truth, not inference:
GET /app/hook/deliveries(App JWT) — GitHub records every delivery + our response status. Non-2xx = our ingress is broken. The single strongest signal.GET /app/installations(App JWT) — expected orgs{script-development, Back-to-code, emmie}all present? A vanished install = revocation.- Synthetic mint-probe (creds/install health) + the existing
GET /up(bus liveness) round it out.
Detection topology — pull-first (Jasper's Q4 call): the bus exposes a queryable GET /github/health that runs installs-present + recent-delivery-failures + synthetic-mint on request and returns per-org status. No scheduler, no alert credential added to the bus — it just answers "am I healthy?". The war room consumes it: /bus-status tables it and the always-on general session pings Commander + Jasper on red via the existing Mattermost path. Promotion to push (bus self-alerts via a scheduled command + Mattermost webhook — net-new scheduler + Fly cron + a secret, all absent today) is deferred to phase-2 hardening, only if war-room-uptime gaps prove polling unreliable.
6. Per-env App topology, install scope, read-only permissions
- One App per environment (staging + prod), mirroring kendo — distinct
app_name, distinct webhook secret, distinct private key (PEM in a Fly secret). - Install once per producing org:
script-development,Back-to-code(covers wijs), emmie's org. Subscribe topull_request+pull_request_review. - Read-only GitHub permissions:
pull-requests: read+metadata: read. No write permission — the App never writes to GitHub.
7. Decommission is deliberate, ordered, and verification-gated
Idempotency makes the rollout double-announce-safe, so the swap is low-risk if ordered correctly. Removing the 14 workflow files is war-room-authorable (producer-repo CI, not bus backend) — but only after the App is proven, as a separate mission:
- Land + deploy the App (prod bus serving
/github/webhook). Both ingresses live; bus dedups. - Install per org (org-admin, one-time ×3).
- Verify the App announces/resolves/consensus-resolves for a labeled PR per org (watch the ledger via
/bus-status). - Only then remove
announce-pr.yml+ retireTOWN_CRIER_URL/TOWN_CRIER_TOKENorg by org, after that org's install is confirmed healthy. Never remove a repo's workflow before its org's install is verified, or that repo falls off-bus. - wijs needs only step 2 — zero per-repo file/secret work.
Resolved Questions (confirmed by Jasper 2026-06-26, TC-0014 comment 2550)
Q2 — Install-state persistence: keep GithubInstallation, drop GithubAppInstallState ✅ confirmed
Kendo carries two install-related tables. They serve different jobs, and town-crier needs exactly one:
GithubInstallation— KEEP. Persistinstallation_id,account_login,account_id,account_typeoninstallation.created; delete oninstallation.deleted(idempotent-on-redeliverylockForUpdate). The bus needs theinstallation_idper org to mint installation tokens (§4), and the row set is the local mirror cross-checked againstGET /app/installationsground truth (§5). No tenant FK — the row exists only to identify "this org's events are ours."GithubAppInstallState— DROP. Kendo'sstate-nonce table binds a self-serve install URL to a tenant during an OAuth-style handshake. Town-crier has no tenant to bind and no per-user self-serve flow — installs are operator-driven (an org-admin clicks Install on the App's page, ×3, once). With no tenant binding there is no nonce to persist.GenerateGithubAppInstallUrlActionand the prune command drop with it.
Rationale: the only reason to persist install metadata is token-minting + the health cross-check; the only reason for state is tenant-binding, which does not exist here. Keep the table that earns its place; drop the one whose entire purpose is the tenancy layer we are not porting.
Q3 — Config home: dedicated config/github.php (kendo parity) ✅ confirmed
Put the four keys (app_id, app_private_key, app_name, webhook_secret) in a dedicated config/github.php rather than folding them into config/town-crier.php. Rationale: blueprint legibility — a reviewer comparing against kendo's verified config/github.php reads the App config in the same place, same shape. The keys are GitHub-App identity, conceptually distinct from town-crier's coordination config (producer_identity, lock TTL, review cap). Drop kendo's OAuth block (client_id/client_secret/scopes/redirect_uri) — no user-OAuth half here. All keys env-injected via #[Config] (ADR-0016).
Options Considered
| Option | Verdict | Reason |
|---|---|---|
Status quo — 14 per-repo announce-pr.yml | Rejected | The recurring N-repo churn, 14 copies of the bus token, and the structural wijs block are exactly what motivates this. |
| Keep workflows but centralize config via a reusable workflow / org-level secret | Rejected | Reduces file churn but keeps the bearer token in every repo, still needs per-repo admin to wire, and does not unblock wijs (still needs its secret set). Half-measure. |
| App as a separate micro-service (not inside the bus) | Rejected | A second deployable, a second auth surface, and a network hop to reach the Actions it would otherwise call in-process. The bus already has the Actions; the App is an ingress, not a service. |
| Bus-hosted App, in-process dispatch, pull-first observability | Accepted | Eliminates the churn, retires the per-repo token (net attack-surface reduction), unblocks wijs with one org install, and absorbs all three workflow jobs into two event subscriptions reusing the verified kendo blueprint minus tenancy. |
| Observability via push (bus scheduler + Mattermost webhook) from day 1 | Deferred | Adds a scheduler + Fly cron + an alert secret the bus deliberately lacks. Pull-first covers the directive with the least new bus surface; promote only if polling proves unreliable. |
Consequences
Positive
- Recurring fleet churn eliminated. Protocol tweaks become one bus deploy, not N-repo PRs.
- A replicated secret retired fleet-wide. 14 copies of
TOWN_CRIER_TOKEN→ one HMAC secret on the bus. - wijs unblocked structurally — one org-admin install on Back-to-code, no per-repo admin.
- Consensus reads need no per-repo token — the App installation token reads reviews natively, retiring the consensus job's
pull-requests:readGITHUB_TOKEN. - The 2026-06-25 cutover-doc §5.5 question is answered — "does consensus stay in CI or move into the bus?" → it moves into the bus, as a
pull_request_reviewhandler.
Negative
- Blast radius inverts from per-repo to per-org. A broken App ingress drops an entire org's announce at once, silently (no red check). This is the load-bearing risk; §5's observability is the mitigation and an acceptance criterion, not a deferral.
- Town-crier gains its first outbound HTTP. The CLAUDE.md "makes no outbound HTTP calls" note must flip. New Principle-#8 surface (two call sites, both timeout-bound).
- Two ingress paths during rollout. Until all 14 repos are de-provisioned, App + workflows both announce. Idempotency makes this a no-op, but the §7 decommission order must be respected so no repo falls off-bus mid-swap.
Risks
- HMAC secret drift / route drop on deploy → silent org-wide announce loss. Mitigation:
hook/deliveriesground-truth check via/github/health(§5); no org de-provisioned while its signal is dark (§7 step 4). - War-room-uptime dependency of pull-first detection — the alarm only fires when the war room polls. Mitigation: accepted for phase 1; push-mode is the phase-2 escape hatch if it proves insufficient.
Enforcement
| What | Mechanism | Level |
|---|---|---|
Both GitHub call sites carry ->timeout() | Pest arch test (town-crier ExternalHttpTimeoutTest equivalent — the bus's first entry) | 1 |
Installation-token cache keyed by installation_id, never the token value | Pest unit test against the cache path (Principle #10) | 1 |
| Behavioral parity for §3 (label filter, head-SHA detection, asymmetric consensus tally, fail-open) | Pest Feature tests replaying recorded pull_request / pull_request_review payloads through the handler | 1 |
| Webhook HMAC fail-closed on empty/invalid signature | Pest Feature test (rejected request, no Action dispatched) | 1 |
No org de-provisioned while its /github/health signal is dark | Operational gate in the §7 decommission mission (war-room-authored sweep) | 3 |
CLAUDE.md "no outbound HTTP" note flipped; /bus-status per-org announce recency live before de-provisioning | Doctrine + skill | 4 |
Implementation
| Territory | State | Notes |
|---|---|---|
| town-crier | ADR Accepted — build not started | Build assigned to Gerard (TC-0014); Jasper signed on trust 2026-06-26 (Q2/Q3 confirmed) and remains the CODEOWNERS merge-gate. Next: Engineer dispatch against the merged Laravel backend → first PR (flips nothing — already Accepted — but is where code begins). Org installs (B2) are Commander, one-time ×3. |
| 14 producer repos | Workflows live, untouched | The announce-pr.yml decommission is a separate war-room-authored sweep, ordered per-org-after-verification (§7 step 4), only after the App proves out. |
References
- Originating ticket: kendo
TC-0014(script instance, project 13) — proposal + Jasper's sign-off (B1 APPROVED, Q1 = ADR, Q4 = pull-first). - Reasoning / build-ready spec:
campaigns/town-crier/2026-06-26-github-app-announce-producer-design.md— parity target (§5), kendo reuse map (§4), risk analysis (§2), observability design (§7.5). - Cutover context:
campaigns/town-crier/2026-06-25-laravel-backend-migration-parity-and-cutover.md(§5.5 consensus-in-CI-vs-bus question, answered here). - Blueprint: kendo
GithubWebhookController+VerifyGithubWebhook+config/github.php+SaveGithubInstallationAction— verified complete (campaign §4). - Related ADRs: GitHub Integration Split (kendo App-token/OAuth split, the bearer-token-equivalence this leans on), Config Attribute Injection, Automated External Provisioning.
- Related doctrine: Principle #8 (explicit external-HTTP timeouts), Principle #10 (rotation-invariant cache keys), the cross-harness coordination contract (
town-crierCLAUDE.md).