Skip to content

ADR-0036: Cross-Harness Producer Ingress — Bus-Hosted GitHub App

Accepted Town-crier Cross-Harness Contract

Date: 2026-06-26 Tracks: kendo TC-0014 (script instance, project 13) Reasoning: campaigns/town-crier/2026-06-26-github-app-announce-producer-design.md (build-ready spec, parity target, risk analysis)

Status note. Jasper cleared the contract change and signed off on trust (TC-0014 comments 2548 + 2550, 2026-06-26: B1 APPROVED, Q1 = this ADR, Q4 = pull-first, Q2 + Q3 confirmed — see below) and assigned the build to Gerard. He declined a separate ADR-text review, trusting the design pass, so the ADR is Accepted (decision adopted) rather than awaiting text sign-off; the first build PR is where the code starts. The co-owned-territory courtesy (town-crier CLAUDE.md) is satisfied: the co-owner granted the contract and remains the CODEOWNERS merge-gate reviewer (no self-merge — a Gerard-authored build PR is gated on him regardless).

Context

The town-crier review bus learns about a PR through a producer — today a per-repo .github/workflows/announce-pr.yml carrying three jobs:

  • announce on labeled / synchronize (filtered to the Agent Review Requested label),
  • resolve on closed / unlabeled,
  • consensus on pull_request_review (counts current-head approvals; auto-resolves at ≥2).

This workflow is live in 14 producer repos (the rollout went fleet-wide). Every protocol tweak is a PR into N repos, and each repo needs hand-provisioning: a TOWN_CRIER_URL var + a TOWN_CRIER_TOKEN secret (repo admin) + the workflow file itself (a workflow-scope token). Config is repo-level with no org inheritance.

Two structural costs follow:

  1. Recurring fleet churn. N-repo PRs + per-repo secret provisioning for every change to a harness-agnostic coordination concern.
  2. A shared bearer token replicated into 14 repos' secrets. The github-action producer identity is carried as TOWN_CRIER_TOKEN in every producer repo. That is 14 copies of one credential.

And it structurally blocks repos where we lack adminwijs is off-bus today precisely because we cannot set its secret or merge its workflow.

The Laravel backend port is merged (epic/laravel-backend #48 + W6 OAuth #47 + W7b #49); the TS src/ is gone. The bus is now a full Laravel Action/DTO/FormRequest app with AnnounceRequestAction / ResolveRequestAction and a Passport-mcp:use identity model (App\Auth\ProducerIdentity, default github-action). Kendo already runs the exact GitHub-App pattern this ADR adopts (it moves issue cards on branch-create / PR-merge) — a verified, near-verbatim blueprint, minus kendo's entire tenancy layer.

This is a cross-harness coordination contract, not a town-crier-internal refactor: the producer ingress is the surface every independently-run harness (war-room general, Jasper's, the-laboratory mad-scientist, krypt0nbull3t) depends on to learn about PRs. Changing where and how PRs are announced changes the contract all harnesses inherit — which is why it earns its own ADR (Jasper's Q1 call).

Decision

Replace the 14 per-repo announce-pr.yml workflows with one bus-hosted GitHub App ("town-crier announce") whose webhook the bus serves. Install once per org → every repo's PR events stream to one endpoint → the bus announces / resolves / consensus-resolves server-side, in-process.

1. Webhook ingress inside the merged Laravel bus (not a new service)

A new HTTP ingress POST /github/webhook, HMAC-verified (X-Hub-Signature-256), unauthenticated by Passport — it is GitHub calling, not a harness. This is a second auth class on the bus alongside Passport mcp:use: harness traffic carries an OAuth bearer; producer traffic carries a GitHub webhook signature. After verifying the signature, the handler dispatches in-process to the existing AnnounceRequestAction / ResolveRequestAction — it does not call the bus's own REST API.

  • Middleware VerifyGithubWebhook: X-Hub-Signature-256 HMAC via #[Config('github.webhook_secret')], hash_equals, fail-closed on empty secret (copy kendo's verbatim).
  • Controller GithubWebhookController: fan out by X-GitHub-Event. installation runs inline; pull_request / pull_request_review dispatch queued jobs. No tenancy — drop kendo's initializeTenantFromPayload / TenantSwitcher entirely.

2. The github-action identity is minted internally — no bus token in any repo

The handler stamps its server-side announce/resolve with the github-action ProducerIdentity (config('town-crier.producer_identity')) — the same predicate the REST resolveByUrl path uses. The App authenticates events by its own webhook HMAC; it does not need, and does not carry, the shared bus bearer token. TOWN_CRIER_TOKEN leaves all 14 repos. This is a net attack-surface reduction: one HMAC secret on the bus replaces 14 copies of a bearer token.

3. Two event subscriptions absorb all three workflow jobs — behaviorally identical

pull_request (actions {opened, reopened, ready_for_review, labeled, synchronize, unlabeled, closed}):

  • Announce iff (action == labeled AND label.name == "Agent Review Requested") OR (action ∈ {opened, reopened, ready_for_review, synchronize} AND the PR's label set contains it). AnnounceRequestData from the payload (pr_url = html_url, repo, title, requester = user.login, head_oid = head.sha). Idempotent on pr_url; same-head re-announce is a no-op; a new head reopens. This is the workflow's announce job if, preserved byte-for-byte.
  • Resolve iff (action == closed AND label present) OR (action == unlabeled AND the removed label is "Agent Review Requested"). Note = "merged" / "closed without merge" / "review label removed".

pull_request_review (review.state == approved AND PR carries the label): fetch the PR's full review list via the App installation token (outbound GET — §4) and apply the identical tally — latest opinionated review per reviewer; count an approval only if commit_id == current head; count CHANGES_REQUESTED without head-filtering. Consensus = ≥2 distinct current-head approvals AND zero outstanding objectionsResolveRequestAction, note "consensus: N approvals (who) @<head>".

The asymmetric head-filtering (approvals head-gated, objections not) is a deliberate false-negative bias — a standing objection blocks consensus even past a new head. Port the rule, not a paraphrase of it. This is the load-bearing parity surface.

4. First outbound HTTP from the bus — explicit timeouts + rotation-invariant token cache

Town-crier has had zero outbound HTTP by design. The App introduces two GitHub call sites, both with explicit ->timeout() (Principle #8 — direct calls, the library-author note does not apply):

  1. Mint installation token — App-JWT (RS256, signed with app_private_key) → POST /app/installations/{id}/access_tokens. ~10s timeout. Token TTL ~1h; cache keyed by installation_id (stable surrogate — Principle #10 rotation-invariance), never by the token value.
  2. Read PR reviewsGET /repos/{repo}/pulls/{n}/reviews (paginated), installation token. ~10s timeout (mirrors the workflow's curl --max-time 10).

Both wrapped so a GitHub outage degrades to "consensus not auto-resolved this round" (the thread stays open — the conservative direction), never a thrown 500 back to GitHub's webhook delivery.

5. Fail-open, but observability is now an acceptance criterion (the silent-failure inversion)

A transient GitHub-read failure or bus hiccup must never strand the fleet loudly — log a warning, return 200 (so GitHub does not retry a parse bug forever; mirrors kendo's 200-on-malformed). But the failure mode inverts: today a broken workflow reds one repo's check and someone notices. A broken App ingress (HMAC drift, route dropped on deploy, install revoked) drops a whole org off the bus with no red check anywhere — silent, fleet-wide.

Per the Commander directive (2026-06-26), detection is a requirement, not an open question. It centers on GitHub ground truth, not inference:

  • GET /app/hook/deliveries (App JWT) — GitHub records every delivery + our response status. Non-2xx = our ingress is broken. The single strongest signal.
  • GET /app/installations (App JWT) — expected orgs {script-development, Back-to-code, emmie} all present? A vanished install = revocation.
  • Synthetic mint-probe (creds/install health) + the existing GET /up (bus liveness) round it out.

Detection topology — pull-first (Jasper's Q4 call): the bus exposes a queryable GET /github/health that runs installs-present + recent-delivery-failures + synthetic-mint on request and returns per-org status. No scheduler, no alert credential added to the bus — it just answers "am I healthy?". The war room consumes it: /bus-status tables it and the always-on general session pings Commander + Jasper on red via the existing Mattermost path. Promotion to push (bus self-alerts via a scheduled command + Mattermost webhook — net-new scheduler + Fly cron + a secret, all absent today) is deferred to phase-2 hardening, only if war-room-uptime gaps prove polling unreliable.

6. Per-env App topology, install scope, read-only permissions

  • One App per environment (staging + prod), mirroring kendo — distinct app_name, distinct webhook secret, distinct private key (PEM in a Fly secret).
  • Install once per producing org: script-development, Back-to-code (covers wijs), emmie's org. Subscribe to pull_request + pull_request_review.
  • Read-only GitHub permissions: pull-requests: read + metadata: read. No write permission — the App never writes to GitHub.

7. Decommission is deliberate, ordered, and verification-gated

Idempotency makes the rollout double-announce-safe, so the swap is low-risk if ordered correctly. Removing the 14 workflow files is war-room-authorable (producer-repo CI, not bus backend) — but only after the App is proven, as a separate mission:

  1. Land + deploy the App (prod bus serving /github/webhook). Both ingresses live; bus dedups.
  2. Install per org (org-admin, one-time ×3).
  3. Verify the App announces/resolves/consensus-resolves for a labeled PR per org (watch the ledger via /bus-status).
  4. Only then remove announce-pr.yml + retire TOWN_CRIER_URL/TOWN_CRIER_TOKEN org by org, after that org's install is confirmed healthy. Never remove a repo's workflow before its org's install is verified, or that repo falls off-bus.
  5. wijs needs only step 2 — zero per-repo file/secret work.

Resolved Questions (confirmed by Jasper 2026-06-26, TC-0014 comment 2550)

Q2 — Install-state persistence: keep GithubInstallation, drop GithubAppInstallState ✅ confirmed

Kendo carries two install-related tables. They serve different jobs, and town-crier needs exactly one:

  • GithubInstallation — KEEP. Persist installation_id, account_login, account_id, account_type on installation.created; delete on installation.deleted (idempotent-on-redelivery lockForUpdate). The bus needs the installation_id per org to mint installation tokens (§4), and the row set is the local mirror cross-checked against GET /app/installations ground truth (§5). No tenant FK — the row exists only to identify "this org's events are ours."
  • GithubAppInstallState — DROP. Kendo's state-nonce table binds a self-serve install URL to a tenant during an OAuth-style handshake. Town-crier has no tenant to bind and no per-user self-serve flow — installs are operator-driven (an org-admin clicks Install on the App's page, ×3, once). With no tenant binding there is no nonce to persist. GenerateGithubAppInstallUrlAction and the prune command drop with it.

Rationale: the only reason to persist install metadata is token-minting + the health cross-check; the only reason for state is tenant-binding, which does not exist here. Keep the table that earns its place; drop the one whose entire purpose is the tenancy layer we are not porting.

Q3 — Config home: dedicated config/github.php (kendo parity) ✅ confirmed

Put the four keys (app_id, app_private_key, app_name, webhook_secret) in a dedicated config/github.php rather than folding them into config/town-crier.php. Rationale: blueprint legibility — a reviewer comparing against kendo's verified config/github.php reads the App config in the same place, same shape. The keys are GitHub-App identity, conceptually distinct from town-crier's coordination config (producer_identity, lock TTL, review cap). Drop kendo's OAuth block (client_id/client_secret/scopes/redirect_uri) — no user-OAuth half here. All keys env-injected via #[Config] (ADR-0016).

Options Considered

OptionVerdictReason
Status quo — 14 per-repo announce-pr.ymlRejectedThe recurring N-repo churn, 14 copies of the bus token, and the structural wijs block are exactly what motivates this.
Keep workflows but centralize config via a reusable workflow / org-level secretRejectedReduces file churn but keeps the bearer token in every repo, still needs per-repo admin to wire, and does not unblock wijs (still needs its secret set). Half-measure.
App as a separate micro-service (not inside the bus)RejectedA second deployable, a second auth surface, and a network hop to reach the Actions it would otherwise call in-process. The bus already has the Actions; the App is an ingress, not a service.
Bus-hosted App, in-process dispatch, pull-first observabilityAcceptedEliminates the churn, retires the per-repo token (net attack-surface reduction), unblocks wijs with one org install, and absorbs all three workflow jobs into two event subscriptions reusing the verified kendo blueprint minus tenancy.
Observability via push (bus scheduler + Mattermost webhook) from day 1DeferredAdds a scheduler + Fly cron + an alert secret the bus deliberately lacks. Pull-first covers the directive with the least new bus surface; promote only if polling proves unreliable.

Consequences

Positive

  • Recurring fleet churn eliminated. Protocol tweaks become one bus deploy, not N-repo PRs.
  • A replicated secret retired fleet-wide. 14 copies of TOWN_CRIER_TOKEN → one HMAC secret on the bus.
  • wijs unblocked structurally — one org-admin install on Back-to-code, no per-repo admin.
  • Consensus reads need no per-repo token — the App installation token reads reviews natively, retiring the consensus job's pull-requests:read GITHUB_TOKEN.
  • The 2026-06-25 cutover-doc §5.5 question is answered — "does consensus stay in CI or move into the bus?" → it moves into the bus, as a pull_request_review handler.

Negative

  • Blast radius inverts from per-repo to per-org. A broken App ingress drops an entire org's announce at once, silently (no red check). This is the load-bearing risk; §5's observability is the mitigation and an acceptance criterion, not a deferral.
  • Town-crier gains its first outbound HTTP. The CLAUDE.md "makes no outbound HTTP calls" note must flip. New Principle-#8 surface (two call sites, both timeout-bound).
  • Two ingress paths during rollout. Until all 14 repos are de-provisioned, App + workflows both announce. Idempotency makes this a no-op, but the §7 decommission order must be respected so no repo falls off-bus mid-swap.

Risks

  • HMAC secret drift / route drop on deploy → silent org-wide announce loss. Mitigation: hook/deliveries ground-truth check via /github/health (§5); no org de-provisioned while its signal is dark (§7 step 4).
  • War-room-uptime dependency of pull-first detection — the alarm only fires when the war room polls. Mitigation: accepted for phase 1; push-mode is the phase-2 escape hatch if it proves insufficient.

Enforcement

WhatMechanismLevel
Both GitHub call sites carry ->timeout()Pest arch test (town-crier ExternalHttpTimeoutTest equivalent — the bus's first entry)1
Installation-token cache keyed by installation_id, never the token valuePest unit test against the cache path (Principle #10)1
Behavioral parity for §3 (label filter, head-SHA detection, asymmetric consensus tally, fail-open)Pest Feature tests replaying recorded pull_request / pull_request_review payloads through the handler1
Webhook HMAC fail-closed on empty/invalid signaturePest Feature test (rejected request, no Action dispatched)1
No org de-provisioned while its /github/health signal is darkOperational gate in the §7 decommission mission (war-room-authored sweep)3
CLAUDE.md "no outbound HTTP" note flipped; /bus-status per-org announce recency live before de-provisioningDoctrine + skill4

Implementation

TerritoryStateNotes
town-crierADR Accepted — build not startedBuild assigned to Gerard (TC-0014); Jasper signed on trust 2026-06-26 (Q2/Q3 confirmed) and remains the CODEOWNERS merge-gate. Next: Engineer dispatch against the merged Laravel backend → first PR (flips nothing — already Accepted — but is where code begins). Org installs (B2) are Commander, one-time ×3.
14 producer reposWorkflows live, untouchedThe announce-pr.yml decommission is a separate war-room-authored sweep, ordered per-org-after-verification (§7 step 4), only after the App proves out.

References

  • Originating ticket: kendo TC-0014 (script instance, project 13) — proposal + Jasper's sign-off (B1 APPROVED, Q1 = ADR, Q4 = pull-first).
  • Reasoning / build-ready spec: campaigns/town-crier/2026-06-26-github-app-announce-producer-design.md — parity target (§5), kendo reuse map (§4), risk analysis (§2), observability design (§7.5).
  • Cutover context: campaigns/town-crier/2026-06-25-laravel-backend-migration-parity-and-cutover.md (§5.5 consensus-in-CI-vs-bus question, answered here).
  • Blueprint: kendo GithubWebhookController + VerifyGithubWebhook + config/github.php + SaveGithubInstallationAction — verified complete (campaign §4).
  • Related ADRs: GitHub Integration Split (kendo App-token/OAuth split, the bearer-token-equivalence this leans on), Config Attribute Injection, Automated External Provisioning.
  • Related doctrine: Principle #8 (explicit external-HTTP timeouts), Principle #10 (rotation-invariant cache keys), the cross-harness coordination contract (town-crier CLAUDE.md).

Architecture documentation for contributors and collaborators.