The Accumulation Problem
Databases grow. PII accumulates. Nobody deletes old sessions, expired tokens, or abandoned user profiles. Over years, you end up with millions of rows of personal data you no longer need — each one a liability.
Running a Minimization Report
SELECT * FROM pgcomply.minimization_report();
Output:
table | column | pii_type | null_pct | has_masking | has_retention | recommendation
----------+------------+------------+----------+-------------+---------------+----------------------------
profiles | fax_number | phone | 97.3% | false | false | HIGH: consider removing
users | middle_name| person_name| 84.2% | false | false | MEDIUM: review necessity
sessions | ip_addr | ip_address | 0.0% | false | false | Add retention policy
orders | ship_phone | phone | 45.1% | false | false | Add masking rule
Setting Retention Policies
-- Sessions older than 30 days
SELECT pgcomply.retain('sessions', 'created_at', '30 days');
-- Temp tokens older than 24 hours
SELECT pgcomply.retain('temp_tokens', 'created_at', '24 hours');
-- Enforce immediately
SELECT pgcomply.enforce_retention();
Schedule daily enforcement:
SELECT pgcomply.schedule_jobs(install := true);
-- Creates: daily 03:00 retention enforcement
Taking Action
For each finding in the minimization report:
- 97% NULL columns: Drop the column or make it truly optional in your API
- No retention policy: Set one based on business purpose
- No masking: Add masking for tables accessed by non-admin roles
- No RLS: Enable Row-Level Security on PII tables
Building a Minimization Review Process
Quarterly Review Template
Run this every quarter and act on findings:
-- 1. Full minimization report
SELECT table_name, column_name, pii_type, null_pct,
has_masking, has_retention,
CASE
WHEN null_pct > 90 THEN 'DROP COLUMN candidate'
WHEN null_pct > 70 THEN 'Make optional in API'
WHEN has_retention = false THEN 'ADD retention policy'
WHEN has_masking = false THEN 'ADD masking rule'
ELSE 'OK'
END AS action
FROM pgcomply.minimization_report()
ORDER BY null_pct DESC;
Common Over-Collection Patterns
Pattern 1: The "just in case" column. A developer adds middle_name, fax_number, or secondary_email because "we might need it." If it is 90%+ NULL after 6 months, you never needed it.
Pattern 2: The migrated legacy column. Data copied from an old system that nobody queries. Check with:
-- Find PII columns never referenced in recent queries
-- (requires pg_stat_statements extension)
SELECT r.table_name, r.column_name, r.pii_type
FROM pgcomply.pii_registry r
WHERE NOT EXISTS (
SELECT 1 FROM pg_stat_statements s
WHERE s.query ILIKE '%' || r.column_name || '%'
);
Pattern 3: Unbounded session data. Session tables that grow forever without retention policies. Every session contains IP addresses (PII) and should have a 30-90 day retention.
Actioning Findings
For each finding, the decision tree is:
- Can we drop the column? If > 90% NULL and no business process uses it →
ALTER TABLE DROP COLUMN - Can we add a retention policy? If data has a natural lifecycle →
pgcomply.retain() - Can we add masking? If non-admin roles access it →
pgcomply.mask() - Can we anonymize? If we need the data for analytics but not PII →
pgcomply.anonymize()
Document every decision in the audit trail:
SELECT pgcomply.checklist_update('gdpr', 'ART-5',
'implemented',
evidence := 'Minimization review Q1-2026: dropped fax_number, added retention to sessions'
);
Summary
Data minimization is not a one-time cleanup — it is an ongoing discipline. pgcomply.minimization_report() gives you the data to make informed decisions about what PII to keep, what to protect, and what to delete. Run it quarterly and act on the findings.