Skip to content

ADR-0033: AVG Erasure by Anonymization-in-Place

Accepted Cross-Project Universal

Date: 2026-06-17 Compliance: AVG/GDPR (Art. 17 right to erasure) · NEN 7510 (codebook ingests minors' data into the emmie care surface) · ISO 27001 (A.5.33 — the audit trail retains identity PII as a lawful record)

Context

codebook serves a participant population that includes minors and feeds care-adjacent data downstream into emmie (NEN 7510). It must honour AVG Art. 17 erasure requests. The pre-existing surface did not:

  • DeleteUserAction soft-deletes (scrambles the password, sets deleted_at) — child PII is fully retained, so it is not erasure.
  • A later HardDeleteUserAction (forceDelete(), wired DELETE /hard-delete/{user}) erases local rows via implicit DB cascade — but that violates Principle #4 (no ON DELETE CASCADE; deletion must be explicit in the Action), it reaches none of the external sinks (S3, emmie, OpenAI), and it destroys pseudonymisable progress/gamification data that has legitimate aggregate value.

The Commander chose anonymize-in-place over hard-delete (2026-06-17). That inverts the completeness burden: anonymize does not fire cascade FKs (rows survive), so correctness is "scrub every PII column on every surviving row + reach every external sink", not "delete every child." A single missed PII column is silent residue — an answered-but-false erasure on minors' data.

A Surveyor deep-dive (reports/codebook/field/2026-06-17-surveyor-pii-scrub-set-map.md) produced the definitive scrub-set via a two-source (migrations × models) cross-check plus a clean observer/boot pass (proving the write-path is fully Action-mediated, so the set is enumerable from schema + models). A fresh-invocation debrief confirmed the core map and added two framework-table gaps (password_resets.email, failed_jobs/jobs.payload) and closed one open item (personal_access_tokens — no PATs minted). This ADR ratifies the decision and the map.

Decision

AVG erasure on codebook is anonymization-in-place via a single AnonymizeUserAction. It severs identity while preserving pseudonymised structure, reaches every enumerated PII column and external sink, and is the only sanctioned erasure path (the implicit-cascade HardDeleteUserAction is retired).

Scrub set (the completeness contract)

The action must process every entry in the Surveyor map. Summary (full table + citations in the field report):

  • users (9 PII columns)first_name/last_name → placeholders; emaildeleted-{id}@anonymized.invalid (NOT NULL + UNIQUE → id-keyed placeholder); passwordHash::make(Str::random(40)); remember_token/invite_token/reset_password_token → null; image → null + delete the S3 object; emmie_token → null + best-effort emmie revoke. Retain is_admin/has_slack/wants_updates (force-false optional).
  • Free-text child PII (scrub in place, preserve the row)messages.message, user_stories.title/content, strategies.name, strategy_steps.text/category, feedback.comment, files.original_filename. reviews.review (two FK paths, both in scope): the erased user appears as subject (reviews.submission_id → a file on their own submission — mentor prose about them) and as author (reviews.user_id — prose they wrote, typically a mentor reviewing other participants). Blank the content on both paths, keep the rows (preserves the authorship/structure trail). The author path was deferred at first as a dual-ownership question (whether a mentor's prose about a third party is the mentor's erasable PII); resolved 2026-06-19 (Commander disposition A) to scrub it, for consistency with every other authored free-text surface — at the accepted collateral that erasing a mentor blanks review prose a still-active participant received (row + linkage survive, only the free text is erased).
  • S3 objects (delete, do not merely null the key) — submission files (files.path, key scheme submissions/{user_id}/{exercise_id}/…) and the profile image (users.image, images/…).
  • Progress / gamification (retain, pseudonymised)exercise_user, achievement_user, pivots. Exception — activities: login timestamps are behavioural PII → clear/scrub the timestamps (retain the row if a non-PII completion signal is needed).
  • Framework tablespassword_resets: delete the user's row(s) by email in the same transaction (email-keyed, unreachable by a user_id scrub, holds cleartext email). failed_jobs/jobs.payload: accept as dormant residue (queue is sync; most mailables use SerializesModels and re-hydrate clean) — documented, not scrubbed. personal_access_tokens: no-op (codebook mints no PATs).

External-sink residuals (accepted, documented)

  • emmie — accept the gap. codebook holds only emmie_token; EmmieService::logout() is revoke-only (nulls the token; emmie care-data survives — there is no bulk-erase-by-token endpoint). Anonymize nulls the local token + best-effort revoke. True downstream Art. 17 completeness requires an emmie-side endpoint → filed as a cross-territory ADR candidate (emmie bulk-erase-by-token).
  • OpenAI — close going forward. Persist the OpenAI thread_id (funded 2026-06-17) at the SendGptQuestionAction write site so AnonymizeUserAction can issue OpenAI-side thread deletes. Threads created before the persistence slice ships remain a documented historical residual.

Audit-trail residue (accepted)

user_audit_logs is append-only + hash-chained (LogRule). Its old_values retain first_name/last_name/email/image, and the anonymize event itself snapshots the real old-values into the chain. This identity PII is accepted as a lawful record under ISO 27001 A.8.15 / A.5.33 and AVG Art. 17(3)(b) (retention for a legal obligation / records of processing). No chain redaction — that would break the tamper-evidence invariant. The decision is explicit, not accidental.

Lifecycle semantics

AnonymizeUserAction also soft-deletes (anonymize = superset of "deactivate"): the account leaves active lists and cannot authenticate (password scramble + all tokens nulled + session/PAT revocation — a no-op here). Restore of an anonymized user is forbidden (it would return an identity-less husk). DB writes inside one transaction (ADR-0011/0029); external-sink calls (S3 delete, emmie revoke, OpenAI delete) run post-commit (ADR-0029 ordering — the same fix DeleteFileAction needs). Emit a logUpdated audit row.

Options Considered

OptionVerdictReason
Soft-delete only (status quo DeleteUserAction)RejectedRetains all child PII — not erasure; a no-op for Art. 17.
Hard-delete (forceDelete + cascade)RejectedImplicit DB cascade violates Principle #4; reaches no external sink; destroys pseudonymisable aggregate/educational data; standing liability on a minors territory.
Anonymize-in-placeAcceptedSevers identity while preserving pseudonymised structure + stats; explicit per-column scrub (auditable completeness); reaches external sinks; ADR-0002/Principle-#4 compliant (no row deletion → no cascade).

Consequences

Positive

  • Honest, auditable erasure — completeness is a per-column contract a test can assert.
  • Preserves referential integrity + pseudonymised progress/aggregate data (educational value retained).
  • Eliminates the Principle-#4-violating implicit-cascade path (HardDeleteUserAction retired).
  • The methodology (FK-graph + framework-table enumeration, external-sink reach, prove-no-residue test) ports to every other AVG territory's erasure.

Negative

  • More code than a forceDelete — every PII column needs an explicit scrub + a placeholder that satisfies NOT NULL/UNIQUE.
  • New PII columns must be added to the scrub set or they silently escape — requires standing enforcement (below).
  • Two documented residuals remain (emmie care-data; pre-persistence OpenAI threads).

Risks

  • Silent residue from a missed column. Mitigation: a completeness Feature test (seed every PII column, anonymize, assert each scrubbed) + an arch test enumerating users string columns and the framework-PII-table checklist against the action's scrub set.
  • Placeholder collision on the UNIQUE email. Mitigation: id-keyed placeholder + a collision test.
  • External-sink call failure after the DB commit. Mitigation: post-commit ordering + best-effort semantics with logging; the DB anonymization is the load-bearing erasure, sinks are belt-and-braces.

Enforcement

WhatMechanismScope
Every mapped PII column is scrubbedtests/Feature completeness test — seed all PII cols + child rows, anonymize, assert placeholder/null/deleted per table:columnAnonymizeUserAction
A new users PII column can't silently escapeArch test enumerating users string columns + a framework-PII-table checklist (password_resets/failed_jobs/jobs/sessions/personal_access_tokens) against the action's scrub settests/Arch
UNIQUE-placeholder safetyFeature test — two anonymized users do not collide on emailAnonymizeUserAction
S3 object actually deleted; emmie/OpenAI calls issuedFeature test with fake disk + mocked clientsAnonymizeUserAction
DB-in-tx, sinks post-commitADR-0029 transaction-scope rule (already enforced)AnonymizeUserAction, DeleteFileAction

Resolved Questions

Free-text child PII: scrub-in-place vs row-delete?

Resolved 2026-06-17. Scrub in place (preserve row + non-PII structure/stats). For reviews.review, blank the content on both FK paths (subject — submission_id; author — user_id) and keep the rows. The author path was first deferred as a dual-ownership question, then resolved 2026-06-19 (Commander disposition A) in favour of scrubbing it — consistent with all other authored free-text, at the accepted collateral that a still-active participant loses the prose of a review authored by an erased mentor (row + linkage survive).

Retain progress/gamification data under anonymize?

Resolved 2026-06-17. Retain (pseudonymised statistics). Exception: activities login timestamps are behavioural PII → scrub the timestamps.

Is the identity PII retained in the append-only audit trail acceptable?

Resolved 2026-06-17. Yes — lawful record under A.8.15 / A.5.33 + AVG Art. 17(3)(b). No chain redaction (preserves tamper-evidence).

emmie and OpenAI residual posture?

Resolved 2026-06-17. emmie: accept the gap (revoke-only) + file a cross-territory ADR candidate for an emmie bulk-erase-by-token endpoint. OpenAI: persist thread_id and delete on erasure; accept pre-persistence threads as a historical residual.

HardDeleteUserAction fate?

Resolved 2026-06-17. Retire — redundant under anonymize, and its implicit cascade violates Principle #4 on a minors territory.

Does anonymize also soft-delete?

Resolved 2026-06-17. Yes — anonymize is a superset of deactivate; restore of an anonymized user is forbidden.

password_resets / failed_jobs (framework-table residue)?

Resolved 2026-06-17. password_resets: delete the user's rows by email in the same transaction. failed_jobs/jobs.payload: accept as dormant residue (queue sync + SerializesModels), documented not scrubbed.

Implementation

TerritoryStateNotes
codebookIn ProgressADR ratified 2026-06-17; scrub-set map verified (field report + debrief). Slices: (1) OpenAI thread_id persistence [soldier-ready]; (2) DeleteFileAction ADR-0029 ordering fix [soldier-ready]; (3) AnonymizeUserAction + completeness/arch tests [after ADR]; (4) retire HardDeleteUserAction.
emmieNot AssessedAVG territory; will face the same erasure question. The emmie bulk-erase-by-token endpoint (this ADR's cross-territory candidate) is an emmie-side prerequisite for codebook's downstream completeness.
ublgenie / wijs / daymateNot AssessedAVG territories — the anonymize methodology applies if/when erasure is scoped.

Architecture documentation for contributors and collaborators.