tutorials
data-classificationiso-27001gdprpostgresql

Automating Data Classification in PostgreSQL for ISO 27001 and GDPR

Automate PII detection in PostgreSQL with pgcomply. Classify columns by sensitivity level for ISO 27001 and GDPR compliance.

RL
Robert Langner
Managing Director, NILS Software GmbH · · 4 min read

Why Classification Matters

Every database contains data with different sensitivity levels. Customer emails are more sensitive than product categories. IBAN numbers need more protection than blog post titles. But without explicit classification, all data gets treated the same — which means either over-protecting everything (expensive) or under-protecting sensitive data (dangerous).

ISO 27001 (Annex A.8.2) requires organizations to classify information assets by sensitivity. GDPR Article 5 requires data minimization and protection proportional to risk. Both start with knowing what you have.

The Manual Classification Problem

Most teams start with a spreadsheet:

| Table | Column | Classification | Owner | Last Review | |-------|--------|---------------|-------|-------------| | users | email | Confidential | Team A | 2024-11-03 | | users | name | Confidential | Team A | 2024-11-03 | | orders | total | Internal | Team B | 2024-11-03 |

This works for about two weeks. Then:

  1. A developer adds a mobile_phone column to users. Nobody updates the spreadsheet.
  2. A new newsletter_subscribers table appears with an email column. Unclassified.
  3. The team that owned orders restructured. The "Owner" column is fiction.

By the time the auditor asks for your classification register, it's 6 months out of date.

Automated Classification with pgcomply

Step 1: Detect PII

SELECT pgcomply.classify();

This scans every table in your schema and detects columns matching PII patterns:

 schema | table_name | column_name | suspected_pii | confidence
--------+------------+-------------+---------------+------------
 public | users      | email       | email         | high
 public | users      | name        | person_name   | high
 public | users      | phone       | phone         | high
 public | users      | tax_id      | government_id | high
 public | profiles   | ip_address  | ip_address    | high
 public | profiles   | street      | address       | medium
 public | payments   | iban        | financial     | high

pgcomply detects 12+ PII patterns including email, phone, names, addresses, IP addresses, government IDs, financial data, dates of birth, and more.

Step 2: Auto-Classify

SELECT pgcomply.auto_classify();

This assigns ISO 27001 levels based on the PII detected:

  • Restricted: Tables with special category PII (government IDs, financial data, health data)
  • Confidential: Tables with regular PII (email, phone, name, address)
  • Internal: Tables with no detected PII

Tables you've explicitly classified are not overwritten.

Step 3: Review the Map

SELECT * FROM pgcomply.classification_map();
 table      | level        | has_pii | pii_count | has_masking | has_rls | has_retention | review_overdue
------------+--------------+---------+-----------+-------------+---------+---------------+---------------
 users      | restricted   | true    | 4         | true        | true    | false         | false
 payments   | restricted   | true    | 1         | true        | true    | true          | false
 profiles   | confidential | true    | 5         | false       | false   | false         | false
 orders     | confidential | true    | 1         | false       | false   | true          | false
 sessions   | confidential | true    | 2         | false       | false   | true          | false
 products   | internal     | false   | 0         | false       | false   | false         | false
 categories | public       | false   | 0         | false       | false   | false         | false

This is your living classification register — always up to date, always queryable.

Step 4: Catch Drift

SELECT * FROM pgcomply.schema_drift();
 drift_type | table_name       | column_name   | suspected_pii | confidence | recommendation
------------+------------------+---------------+---------------+------------+--------------------------------
 new_column | users            | mobile_phone  | phone         | high       | Register with register_pii()
 new_table  | newsletter_subs  | email_address | email         | high       | Classify and register PII

Run this after every deployment. Better yet, schedule it:

SELECT pgcomply.schedule_jobs(install := true);
-- Adds weekly drift detection every Monday at 07:00

Classification-Driven Protection

Classification isn't just documentation — it drives protection decisions:

| Level | Masking | RLS | Retention | Encryption | Access Review | |-------|---------|-----|-----------|------------|---------------| | Public | No | No | No | No | No | | Internal | No | No | Optional | At rest | Annual | | Confidential | Yes | Recommended | Yes | At rest + transit | Quarterly | | Restricted | Yes | Required | Yes | At rest + transit | Monthly |

With pgcomply, you can verify that protection matches classification:

-- Find confidential/restricted tables missing protection
SELECT table_name, level
FROM pgcomply.classification_map()
WHERE level IN ('confidential', 'restricted')
  AND (has_masking = false OR has_rls = false);

Summary

Data classification is the foundation of database security. Without it, you can't know what to protect or whether your protections are adequate. pgcomply automates the entire process: detect PII, assign sensitivity levels, monitor for drift, and verify that protection matches classification. One SQL command replaces the spreadsheet that nobody maintains.

Frequently Asked Questions

What are the ISO 27001 data classification levels?
ISO 27001 (Annex A.8.2) requires organizations to classify information by sensitivity. The standard levels are: Public (no impact if disclosed), Internal (limited impact), Confidential (significant impact, contains PII), and Restricted (severe impact, contains special category PII like health data, financial data, or government IDs).
Is data classification mandatory for GDPR?
GDPR does not explicitly require classification levels, but Article 5(1)(c) requires data minimization, and Article 25 requires data protection by design. Classification helps you identify where PII exists and whether it has adequate protection (masking, RLS, retention policies). In practice, auditors expect classification.
How does pgcomply detect PII automatically?
pgcomply uses pattern matching on column names and data types. It checks for patterns like email, phone, name, address, ip_address, ssn, tax_id, iban, date_of_birth, passport, salary, and more. Detection confidence is rated as high, medium, or low. Manual registration via register_pii() can supplement or override automatic detection.
How often should data classification be reviewed?
ISO 27001 recommends annual reviews, but in practice, classification should be re-validated after every significant schema change. pgcomply.schema_drift() automates this by detecting new tables and columns that haven't been classified yet.

Related Articles