ADR-0033: AVG Erasure by Anonymization-in-Place
Accepted Cross-Project UniversalDate: 2026-06-17 Compliance: AVG/GDPR (Art. 17 right to erasure) · NEN 7510 (codebook ingests minors' data into the emmie care surface) · ISO 27001 (A.5.33 — the audit trail retains identity PII as a lawful record)
Context
codebook serves a participant population that includes minors and feeds care-adjacent data downstream into emmie (NEN 7510). It must honour AVG Art. 17 erasure requests. The pre-existing surface did not:
DeleteUserActionsoft-deletes (scrambles the password, setsdeleted_at) — child PII is fully retained, so it is not erasure.- A later
HardDeleteUserAction(forceDelete(), wiredDELETE /hard-delete/{user}) erases local rows via implicit DB cascade — but that violates Principle #4 (noON DELETE CASCADE; deletion must be explicit in the Action), it reaches none of the external sinks (S3, emmie, OpenAI), and it destroys pseudonymisable progress/gamification data that has legitimate aggregate value.
The Commander chose anonymize-in-place over hard-delete (2026-06-17). That inverts the completeness burden: anonymize does not fire cascade FKs (rows survive), so correctness is "scrub every PII column on every surviving row + reach every external sink", not "delete every child." A single missed PII column is silent residue — an answered-but-false erasure on minors' data.
A Surveyor deep-dive (reports/codebook/field/2026-06-17-surveyor-pii-scrub-set-map.md) produced the definitive scrub-set via a two-source (migrations × models) cross-check plus a clean observer/boot pass (proving the write-path is fully Action-mediated, so the set is enumerable from schema + models). A fresh-invocation debrief confirmed the core map and added two framework-table gaps (password_resets.email, failed_jobs/jobs.payload) and closed one open item (personal_access_tokens — no PATs minted). This ADR ratifies the decision and the map.
Decision
AVG erasure on codebook is anonymization-in-place via a single AnonymizeUserAction. It severs identity while preserving pseudonymised structure, reaches every enumerated PII column and external sink, and is the only sanctioned erasure path (the implicit-cascade HardDeleteUserAction is retired).
Scrub set (the completeness contract)
The action must process every entry in the Surveyor map. Summary (full table + citations in the field report):
users(9 PII columns) —first_name/last_name→ placeholders;email→deleted-{id}@anonymized.invalid(NOT NULL + UNIQUE → id-keyed placeholder);password→Hash::make(Str::random(40));remember_token/invite_token/reset_password_token→ null;image→ null + delete the S3 object;emmie_token→ null + best-effort emmie revoke. Retainis_admin/has_slack/wants_updates(force-falseoptional).- Free-text child PII (scrub in place, preserve the row) —
messages.message,user_stories.title/content,strategies.name,strategy_steps.text/category,feedback.comment,files.original_filename.reviews.review(two FK paths, both in scope): the erased user appears as subject (reviews.submission_id→ a file on their own submission — mentor prose about them) and as author (reviews.user_id— prose they wrote, typically a mentor reviewing other participants). Blank the content on both paths, keep the rows (preserves the authorship/structure trail). The author path was deferred at first as a dual-ownership question (whether a mentor's prose about a third party is the mentor's erasable PII); resolved 2026-06-19 (Commander disposition A) to scrub it, for consistency with every other authored free-text surface — at the accepted collateral that erasing a mentor blanks review prose a still-active participant received (row + linkage survive, only the free text is erased). - S3 objects (delete, do not merely null the key) — submission files (
files.path, key schemesubmissions/{user_id}/{exercise_id}/…) and the profile image (users.image,images/…). - Progress / gamification (retain, pseudonymised) —
exercise_user,achievement_user, pivots. Exception —activities: login timestamps are behavioural PII → clear/scrub the timestamps (retain the row if a non-PII completion signal is needed). - Framework tables —
password_resets: delete the user's row(s) by email in the same transaction (email-keyed, unreachable by auser_idscrub, holds cleartext email).failed_jobs/jobs.payload: accept as dormant residue (queue issync; most mailables useSerializesModelsand re-hydrate clean) — documented, not scrubbed.personal_access_tokens: no-op (codebook mints no PATs).
External-sink residuals (accepted, documented)
- emmie — accept the gap. codebook holds only
emmie_token;EmmieService::logout()is revoke-only (nulls the token; emmie care-data survives — there is no bulk-erase-by-token endpoint). Anonymize nulls the local token + best-effort revoke. True downstream Art. 17 completeness requires an emmie-side endpoint → filed as a cross-territory ADR candidate (emmie bulk-erase-by-token). - OpenAI — close going forward. Persist the OpenAI
thread_id(funded 2026-06-17) at theSendGptQuestionActionwrite site soAnonymizeUserActioncan issue OpenAI-side thread deletes. Threads created before the persistence slice ships remain a documented historical residual.
Audit-trail residue (accepted)
user_audit_logs is append-only + hash-chained (LogRule). Its old_values retain first_name/last_name/email/image, and the anonymize event itself snapshots the real old-values into the chain. This identity PII is accepted as a lawful record under ISO 27001 A.8.15 / A.5.33 and AVG Art. 17(3)(b) (retention for a legal obligation / records of processing). No chain redaction — that would break the tamper-evidence invariant. The decision is explicit, not accidental.
Lifecycle semantics
AnonymizeUserAction also soft-deletes (anonymize = superset of "deactivate"): the account leaves active lists and cannot authenticate (password scramble + all tokens nulled + session/PAT revocation — a no-op here). Restore of an anonymized user is forbidden (it would return an identity-less husk). DB writes inside one transaction (ADR-0011/0029); external-sink calls (S3 delete, emmie revoke, OpenAI delete) run post-commit (ADR-0029 ordering — the same fix DeleteFileAction needs). Emit a logUpdated audit row.
Options Considered
| Option | Verdict | Reason |
|---|---|---|
Soft-delete only (status quo DeleteUserAction) | Rejected | Retains all child PII — not erasure; a no-op for Art. 17. |
Hard-delete (forceDelete + cascade) | Rejected | Implicit DB cascade violates Principle #4; reaches no external sink; destroys pseudonymisable aggregate/educational data; standing liability on a minors territory. |
| Anonymize-in-place | Accepted | Severs identity while preserving pseudonymised structure + stats; explicit per-column scrub (auditable completeness); reaches external sinks; ADR-0002/Principle-#4 compliant (no row deletion → no cascade). |
Consequences
Positive
- Honest, auditable erasure — completeness is a per-column contract a test can assert.
- Preserves referential integrity + pseudonymised progress/aggregate data (educational value retained).
- Eliminates the Principle-#4-violating implicit-cascade path (
HardDeleteUserActionretired). - The methodology (FK-graph + framework-table enumeration, external-sink reach, prove-no-residue test) ports to every other AVG territory's erasure.
Negative
- More code than a
forceDelete— every PII column needs an explicit scrub + a placeholder that satisfies NOT NULL/UNIQUE. - New PII columns must be added to the scrub set or they silently escape — requires standing enforcement (below).
- Two documented residuals remain (emmie care-data; pre-persistence OpenAI threads).
Risks
- Silent residue from a missed column. Mitigation: a completeness Feature test (seed every PII column, anonymize, assert each scrubbed) + an arch test enumerating
usersstring columns and the framework-PII-table checklist against the action's scrub set. - Placeholder collision on the UNIQUE
email. Mitigation: id-keyed placeholder + a collision test. - External-sink call failure after the DB commit. Mitigation: post-commit ordering + best-effort semantics with logging; the DB anonymization is the load-bearing erasure, sinks are belt-and-braces.
Enforcement
| What | Mechanism | Scope |
|---|---|---|
| Every mapped PII column is scrubbed | tests/Feature completeness test — seed all PII cols + child rows, anonymize, assert placeholder/null/deleted per table:column | AnonymizeUserAction |
A new users PII column can't silently escape | Arch test enumerating users string columns + a framework-PII-table checklist (password_resets/failed_jobs/jobs/sessions/personal_access_tokens) against the action's scrub set | tests/Arch |
| UNIQUE-placeholder safety | Feature test — two anonymized users do not collide on email | AnonymizeUserAction |
| S3 object actually deleted; emmie/OpenAI calls issued | Feature test with fake disk + mocked clients | AnonymizeUserAction |
| DB-in-tx, sinks post-commit | ADR-0029 transaction-scope rule (already enforced) | AnonymizeUserAction, DeleteFileAction |
Resolved Questions
Free-text child PII: scrub-in-place vs row-delete?
Resolved 2026-06-17. Scrub in place (preserve row + non-PII structure/stats). For reviews.review, blank the content on both FK paths (subject — submission_id; author — user_id) and keep the rows. The author path was first deferred as a dual-ownership question, then resolved 2026-06-19 (Commander disposition A) in favour of scrubbing it — consistent with all other authored free-text, at the accepted collateral that a still-active participant loses the prose of a review authored by an erased mentor (row + linkage survive).
Retain progress/gamification data under anonymize?
Resolved 2026-06-17. Retain (pseudonymised statistics). Exception: activities login timestamps are behavioural PII → scrub the timestamps.
Is the identity PII retained in the append-only audit trail acceptable?
Resolved 2026-06-17. Yes — lawful record under A.8.15 / A.5.33 + AVG Art. 17(3)(b). No chain redaction (preserves tamper-evidence).
emmie and OpenAI residual posture?
Resolved 2026-06-17. emmie: accept the gap (revoke-only) + file a cross-territory ADR candidate for an emmie bulk-erase-by-token endpoint. OpenAI: persist thread_id and delete on erasure; accept pre-persistence threads as a historical residual.
HardDeleteUserAction fate?
Resolved 2026-06-17. Retire — redundant under anonymize, and its implicit cascade violates Principle #4 on a minors territory.
Does anonymize also soft-delete?
Resolved 2026-06-17. Yes — anonymize is a superset of deactivate; restore of an anonymized user is forbidden.
password_resets / failed_jobs (framework-table residue)?
Resolved 2026-06-17. password_resets: delete the user's rows by email in the same transaction. failed_jobs/jobs.payload: accept as dormant residue (queue sync + SerializesModels), documented not scrubbed.
Implementation
| Territory | State | Notes |
|---|---|---|
| codebook | In Progress | ADR ratified 2026-06-17; scrub-set map verified (field report + debrief). Slices: (1) OpenAI thread_id persistence [soldier-ready]; (2) DeleteFileAction ADR-0029 ordering fix [soldier-ready]; (3) AnonymizeUserAction + completeness/arch tests [after ADR]; (4) retire HardDeleteUserAction. |
| emmie | Not Assessed | AVG territory; will face the same erasure question. The emmie bulk-erase-by-token endpoint (this ADR's cross-territory candidate) is an emmie-side prerequisite for codebook's downstream completeness. |
| ublgenie / wijs / daymate | Not Assessed | AVG territories — the anonymize methodology applies if/when erasure is scoped. |