tutorials
schema-driftpostgresqlpii-detectioncompliance

Schema Drift Detection: Catching Unregistered PII Before Your Auditor Does

Detect and prevent compliance gaps when database schemas change. Automated PII pattern matching catches new columns containing personal data.

RL
Robert Langner
Managing Director, NILS Software GmbH · · 4 min read

The Drift Problem

Your PII registry is perfect today. But schemas change constantly: new features add columns, migrations restructure tables, developers create temporary tables that become permanent. Without drift detection, your compliance posture degrades with every deployment.

Running Drift Detection

SELECT * FROM pgcomply.schema_drift();

Output:

 drift_type  | table_name       | column_name    | suspected_pii | confidence | recommendation
-------------+------------------+----------------+---------------+------------+----------------------------
 new_column  | users            | mobile_phone   | phone         | high       | Register with register_pii()
 new_table   | newsletter_subs  | email_address  | email         | high       | Classify and register PII
 stale_reg   | temp_imports     | ssn            | government_id | high       | Table dropped, remove registration

Three types of drift:

  1. new_column: A column matching PII patterns exists but is not registered
  2. new_table: A new table has columns matching PII patterns
  3. stale_reg: A registered PII column no longer exists (table or column was dropped)

Pattern Matching

pgcomply checks column names against patterns for:

| PII Type | Patterns Matched | |----------|-----------------| | email | email, e_mail, email_address, mail | | phone | phone, mobile, tel, telephone, fax | | name | name, first_name, last_name, full_name | | address | address, street, city, zip, postal | | ip_address | ip, ip_addr, ip_address, client_ip | | government_id | ssn, tax_id, national_id, passport | | financial | iban, account_number, card_number | | date_of_birth | birth, dob, birthday, date_of_birth |

Automating Detection

Scheduled (Weekly)

SELECT pgcomply.schedule_jobs(install := true);
-- Adds: weekly drift detection every Monday at 07:00

CI/CD Integration

# In your deployment script
DRIFT=$(psql -t -c "SELECT count(*) FROM pgcomply.schema_drift() WHERE confidence = 'high'")
if [ "$DRIFT" -gt 0 ]; then
  echo "ERROR: Unregistered PII detected. Register before deploying."
  exit 1
fi

Resolving Drift

When drift is detected:

-- Register the new PII column
SELECT pgcomply.register_pii('users', 'mobile_phone', 'phone', 'user_id');

-- Re-classify data levels
SELECT pgcomply.auto_classify();

-- Run drift check again to verify
SELECT * FROM pgcomply.schema_drift();
-- Should return empty

Real-World Drift Scenarios

Scenario: The Emergency Hotfix

At 2 AM, a developer adds a column to fix a production issue:

ALTER TABLE orders ADD COLUMN customer_phone TEXT;
-- Fixed the shipping notification bug. Back to bed.

Next Monday, the weekly drift check catches it:

SELECT * FROM pgcomply.schema_drift();
 drift_type | table_name | column_name    | suspected_pii | confidence
------------+------------+----------------+---------------+-----------
 new_column | orders     | customer_phone | phone         | high

Without drift detection, this column would remain unregistered indefinitely — invisible to forget(), not included in inspect(), and not masked for analytics roles.

Scenario: The New Developer

A new team member creates a staging utility table:

CREATE TABLE tmp_user_export (
  user_id TEXT,
  full_name TEXT,
  email_addr TEXT,
  home_address TEXT,
  phone_mobile TEXT
);

Drift detection flags five PII columns in a single table:

 drift_type | table_name      | column_name   | suspected_pii | confidence
------------+-----------------+---------------+---------------+-----------
 new_table  | tmp_user_export | full_name     | person_name   | high
 new_table  | tmp_user_export | email_addr    | email         | high
 new_table  | tmp_user_export | home_address  | address       | high
 new_table  | tmp_user_export | phone_mobile  | phone         | high

Scenario: Stale Registrations

A table was dropped during a migration but its PII registration remains:

 drift_type  | table_name      | column_name | suspected_pii | confidence
-------------+-----------------+-------------+---------------+-----------
 stale_reg   | legacy_contacts | email       | email         | high

Clean up:

DELETE FROM pgcomply.pii_registry WHERE table_name = 'legacy_contacts';

Customizing Detection Patterns

pgcomply detects PII by column name patterns. If your codebase uses non-standard naming:

-- Check what patterns pgcomply uses
-- Column names containing these substrings trigger detection:
-- email, mail, phone, mobile, tel, name, first_name, last_name,
-- address, street, city, zip, postal, ip, ip_addr, ssn, tax_id,
-- national_id, passport, iban, account_number, card, birth, dob,
-- salary, income, wage

-- If your columns use different conventions (e.g., 'kontakt_telefon'),
-- register them manually:
SELECT pgcomply.register_pii('kontakte', 'kontakt_telefon', 'phone', 'kontakt_id');

Summary

Schema drift detection is the difference between compliance that works today and compliance that works tomorrow. Run pgcomply.schema_drift() after every deployment, and automate it weekly as a safety net. The cost of catching drift early is one SQL call; the cost of missing it is an audit finding.

Frequently Asked Questions

What PII patterns does pgcomply detect?
pgcomply scans column names for patterns matching 12+ PII types: email, phone, name, address, ip_address, ssn, tax_id, iban, date_of_birth, passport, salary, and government identifiers. Each match includes a confidence level (high, medium, low).
How do I integrate drift detection into CI/CD?
Add a step to your deployment pipeline that runs SELECT * FROM pgcomply.schema_drift() and fails if any high-confidence results are returned. This prevents deploying schema changes that introduce unregistered PII.

Related Articles