gdpr
gdprdata-minimizationarticle-5postgresql

Data Minimization in PostgreSQL: Finding and Eliminating Unnecessary PII

Implement GDPR Article 5(1)(c) data minimization in PostgreSQL. Detect over-collected PII, identify unused columns, and automate retention policies.

RL
Robert Langner
Managing Director, NILS Software GmbH · · 3 min read

The Accumulation Problem

Databases grow. PII accumulates. Nobody deletes old sessions, expired tokens, or abandoned user profiles. Over years, you end up with millions of rows of personal data you no longer need — each one a liability.

Running a Minimization Report

SELECT * FROM pgcomply.minimization_report();

Output:

 table    | column     | pii_type   | null_pct | has_masking | has_retention | recommendation
----------+------------+------------+----------+-------------+---------------+----------------------------
 profiles | fax_number | phone      | 97.3%    | false       | false         | HIGH: consider removing
 users    | middle_name| person_name| 84.2%    | false       | false         | MEDIUM: review necessity
 sessions | ip_addr    | ip_address | 0.0%     | false       | false         | Add retention policy
 orders   | ship_phone | phone      | 45.1%    | false       | false         | Add masking rule

Setting Retention Policies

-- Sessions older than 30 days
SELECT pgcomply.retain('sessions', 'created_at', '30 days');

-- Temp tokens older than 24 hours
SELECT pgcomply.retain('temp_tokens', 'created_at', '24 hours');

-- Enforce immediately
SELECT pgcomply.enforce_retention();

Schedule daily enforcement:

SELECT pgcomply.schedule_jobs(install := true);
-- Creates: daily 03:00 retention enforcement

Taking Action

For each finding in the minimization report:

  1. 97% NULL columns: Drop the column or make it truly optional in your API
  2. No retention policy: Set one based on business purpose
  3. No masking: Add masking for tables accessed by non-admin roles
  4. No RLS: Enable Row-Level Security on PII tables

Building a Minimization Review Process

Quarterly Review Template

Run this every quarter and act on findings:

-- 1. Full minimization report
SELECT table_name, column_name, pii_type, null_pct,
  has_masking, has_retention,
  CASE
    WHEN null_pct > 90 THEN 'DROP COLUMN candidate'
    WHEN null_pct > 70 THEN 'Make optional in API'
    WHEN has_retention = false THEN 'ADD retention policy'
    WHEN has_masking = false THEN 'ADD masking rule'
    ELSE 'OK'
  END AS action
FROM pgcomply.minimization_report()
ORDER BY null_pct DESC;

Common Over-Collection Patterns

Pattern 1: The "just in case" column. A developer adds middle_name, fax_number, or secondary_email because "we might need it." If it is 90%+ NULL after 6 months, you never needed it.

Pattern 2: The migrated legacy column. Data copied from an old system that nobody queries. Check with:

-- Find PII columns never referenced in recent queries
-- (requires pg_stat_statements extension)
SELECT r.table_name, r.column_name, r.pii_type
FROM pgcomply.pii_registry r
WHERE NOT EXISTS (
  SELECT 1 FROM pg_stat_statements s
  WHERE s.query ILIKE '%' || r.column_name || '%'
);

Pattern 3: Unbounded session data. Session tables that grow forever without retention policies. Every session contains IP addresses (PII) and should have a 30-90 day retention.

Actioning Findings

For each finding, the decision tree is:

  1. Can we drop the column? If > 90% NULL and no business process uses it → ALTER TABLE DROP COLUMN
  2. Can we add a retention policy? If data has a natural lifecycle → pgcomply.retain()
  3. Can we add masking? If non-admin roles access it → pgcomply.mask()
  4. Can we anonymize? If we need the data for analytics but not PII → pgcomply.anonymize()

Document every decision in the audit trail:

SELECT pgcomply.checklist_update('gdpr', 'ART-5',
  'implemented',
  evidence := 'Minimization review Q1-2026: dropped fax_number, added retention to sessions'
);

Summary

Data minimization is not a one-time cleanup — it is an ongoing discipline. pgcomply.minimization_report() gives you the data to make informed decisions about what PII to keep, what to protect, and what to delete. Run it quarterly and act on the findings.

Frequently Asked Questions

What does data minimization mean under GDPR?
Article 5(1)(c) requires personal data to be 'adequate, relevant, and limited to what is necessary in relation to the purposes for which they are processed.' This means you should only collect PII you actually need and delete it when the purpose is fulfilled.
How do I identify unnecessary PII columns?
Run pgcomply.minimization_report() which calculates the null percentage for each PII column. A column that is 95% NULL across your dataset is data you are collecting but rarely using — a strong candidate for removal or making optional.
How do retention policies work in pgcomply?
pgcomply.retain() sets a retention period on a table column. pgcomply.enforce_retention() then deletes rows older than the policy. Schedule it daily via pg_cron: rows past their retention period are automatically removed.

Related Articles