Why Classification Matters
Every database contains data with different sensitivity levels. Customer emails are more sensitive than product categories. IBAN numbers need more protection than blog post titles. But without explicit classification, all data gets treated the same — which means either over-protecting everything (expensive) or under-protecting sensitive data (dangerous).
ISO 27001 (Annex A.8.2) requires organizations to classify information assets by sensitivity. GDPR Article 5 requires data minimization and protection proportional to risk. Both start with knowing what you have.
The Manual Classification Problem
Most teams start with a spreadsheet:
| Table | Column | Classification | Owner | Last Review | |-------|--------|---------------|-------|-------------| | users | email | Confidential | Team A | 2024-11-03 | | users | name | Confidential | Team A | 2024-11-03 | | orders | total | Internal | Team B | 2024-11-03 |
This works for about two weeks. Then:
- A developer adds a
mobile_phonecolumn tousers. Nobody updates the spreadsheet. - A new
newsletter_subscriberstable appears with anemailcolumn. Unclassified. - The team that owned
ordersrestructured. The "Owner" column is fiction.
By the time the auditor asks for your classification register, it's 6 months out of date.
Automated Classification with pgcomply
Step 1: Detect PII
SELECT pgcomply.classify();
This scans every table in your schema and detects columns matching PII patterns:
schema | table_name | column_name | suspected_pii | confidence
--------+------------+-------------+---------------+------------
public | users | email | email | high
public | users | name | person_name | high
public | users | phone | phone | high
public | users | tax_id | government_id | high
public | profiles | ip_address | ip_address | high
public | profiles | street | address | medium
public | payments | iban | financial | high
pgcomply detects 12+ PII patterns including email, phone, names, addresses, IP addresses, government IDs, financial data, dates of birth, and more.
Step 2: Auto-Classify
SELECT pgcomply.auto_classify();
This assigns ISO 27001 levels based on the PII detected:
- Restricted: Tables with special category PII (government IDs, financial data, health data)
- Confidential: Tables with regular PII (email, phone, name, address)
- Internal: Tables with no detected PII
Tables you've explicitly classified are not overwritten.
Step 3: Review the Map
SELECT * FROM pgcomply.classification_map();
table | level | has_pii | pii_count | has_masking | has_rls | has_retention | review_overdue
------------+--------------+---------+-----------+-------------+---------+---------------+---------------
users | restricted | true | 4 | true | true | false | false
payments | restricted | true | 1 | true | true | true | false
profiles | confidential | true | 5 | false | false | false | false
orders | confidential | true | 1 | false | false | true | false
sessions | confidential | true | 2 | false | false | true | false
products | internal | false | 0 | false | false | false | false
categories | public | false | 0 | false | false | false | false
This is your living classification register — always up to date, always queryable.
Step 4: Catch Drift
SELECT * FROM pgcomply.schema_drift();
drift_type | table_name | column_name | suspected_pii | confidence | recommendation
------------+------------------+---------------+---------------+------------+--------------------------------
new_column | users | mobile_phone | phone | high | Register with register_pii()
new_table | newsletter_subs | email_address | email | high | Classify and register PII
Run this after every deployment. Better yet, schedule it:
SELECT pgcomply.schedule_jobs(install := true);
-- Adds weekly drift detection every Monday at 07:00
Classification-Driven Protection
Classification isn't just documentation — it drives protection decisions:
| Level | Masking | RLS | Retention | Encryption | Access Review | |-------|---------|-----|-----------|------------|---------------| | Public | No | No | No | No | No | | Internal | No | No | Optional | At rest | Annual | | Confidential | Yes | Recommended | Yes | At rest + transit | Quarterly | | Restricted | Yes | Required | Yes | At rest + transit | Monthly |
With pgcomply, you can verify that protection matches classification:
-- Find confidential/restricted tables missing protection
SELECT table_name, level
FROM pgcomply.classification_map()
WHERE level IN ('confidential', 'restricted')
AND (has_masking = false OR has_rls = false);
Summary
Data classification is the foundation of database security. Without it, you can't know what to protect or whether your protections are adequate. pgcomply automates the entire process: detect PII, assign sensitivity levels, monitor for drift, and verify that protection matches classification. One SQL command replaces the spreadsheet that nobody maintains.