gdpr
pii-lineagedata-lineagegdprarticle-30

PII Data Lineage in PostgreSQL: Tracing Personal Data from Source to Storage

Track PII data lineage in PostgreSQL for GDPR Article 30. Map personal data flows from source to storage with documented purposes.

RL
Robert Langner
Managing Director, NILS Software GmbH · · 3 min read

Why Lineage Matters

An auditor asks: "Where does user email data flow in your system?" Without lineage, you search through code, documentation, and developer memory. With lineage:

SELECT * FROM pgcomply.pii_lineage('email');
 table_name  | column      | pii_type | source         | purpose       | legal_basis       | retention | masking | rls
-------------+-------------+----------+----------------+---------------+-------------------+-----------+---------+-----
 users       | email       | email    | signup_form    | account       | contract          | —         | partial | yes
 orders      | receipt_email| email   | users.email    | billing       | contract          | 7 years   | no      | no
 newsletter  | email       | email    | consent_form   | newsletter    | consent           | until_withdrawn | no  | no
 sessions    | —           | —        | —              | —             | —                 | 30 days   | —       | —

This is your Article 30 record for email data.

Building Lineage

Step 1: Complete PII Registry

-- Ensure all PII columns are registered
SELECT pgcomply.classify();
SELECT * FROM pgcomply.schema_drift();
-- Address any gaps

Step 2: Document Purposes

-- Link PII to processing purposes
SELECT pgcomply.define_purpose('account', 'Account management', 'contract');
SELECT pgcomply.define_purpose('billing', 'Payment processing', 'contract');
SELECT pgcomply.define_purpose('newsletter', 'Email newsletter', 'consent');

Step 3: Generate Lineage

-- Full lineage for all PII types
SELECT * FROM pgcomply.pii_lineage();

-- Lineage for a specific PII type
SELECT * FROM pgcomply.pii_lineage('financial');

Step 4: Find Gaps

-- PII columns without a documented purpose
SELECT table_name, column_name, pii_type
FROM pgcomply.pii_registry r
LEFT JOIN pgcomply.consent_purposes p ON true
WHERE NOT EXISTS (
  SELECT 1 FROM pgcomply.pii_lineage()
  WHERE table_name = r.table_name AND purpose IS NOT NULL
);

Automating Lineage Updates

PII lineage must stay current. Automate detection of lineage gaps:

-- Weekly: check for PII without documented purpose
-- Add to your pg_cron schedule:
SELECT cron.schedule('lineage-check', '0 7 * * 1', $$
  SELECT table_name, column_name, pii_type
  FROM pgcomply.pii_registry r
  WHERE NOT EXISTS (
    SELECT 1 FROM pgcomply.pii_lineage()
    WHERE table_name = r.table_name
      AND purpose IS NOT NULL
  );
$$);

Lineage for International Data Transfers

GDPR Article 44-49 governs international data transfers. If your PII flows to non-EU systems, lineage must document this:

-- Tag tables that replicate to non-EU systems
SELECT pgcomply.tag('analytics_exports', 'data_transfer', 'US - Mixpanel');
SELECT pgcomply.tag('email_queue', 'data_transfer', 'US - SendGrid');

-- Lineage view shows transfer destinations alongside PII types
SELECT l.*, t.value AS transfer_destination
FROM pgcomply.pii_lineage() l
LEFT JOIN pgcomply.get_tags(l.table_name) t ON t.key = 'data_transfer';

Any table with a data_transfer tag and PII requires either Standard Contractual Clauses (SCCs) or an adequacy decision.

Visualizing Data Flows

For non-technical stakeholders (DPO, legal, auditors), export lineage as a structured report:

-- Pro: Generate Article 30 records of processing
SELECT pgcomply.compliance_report('article_30');

For the Community Edition, build a simple view:

-- Summary: how many PII columns per purpose
SELECT purpose, legal_basis, count(*) AS pii_columns,
  array_agg(DISTINCT pii_type) AS data_types
FROM pgcomply.pii_lineage()
WHERE purpose IS NOT NULL
GROUP BY purpose, legal_basis
ORDER BY pii_columns DESC;
 purpose    | legal_basis | pii_columns | data_types
------------+-------------+-------------+-----------------------------------
 account    | contract    | 8           | {email,phone,person_name,address}
 billing    | contract    | 4           | {email,financial,address}
 newsletter | consent     | 2           | {email,person_name}
 analytics  | legitimate  | 1           | {ip_address}

This table is the foundation of your Article 30 record — it answers "what PII do we process, for what purpose, under which legal basis."

Summary

PII lineage turns your GDPR Article 30 obligation from a documentation exercise into a queryable database asset. pgcomply.pii_lineage() generates the map automatically from your PII registry, consent records, and audit trail. Run it before every audit to ensure your records of processing are complete and accurate.

Frequently Asked Questions

What are records of processing activities?
GDPR Article 30 requires controllers to maintain records documenting: purposes of processing, categories of data subjects and personal data, recipients, international transfers, retention periods, and security measures. This is essentially a data map of your organization.
How does pgcomply generate lineage information?
pgcomply combines three data sources: the PII registry (which tables and columns store PII), consent records (what purposes and legal bases exist), and the audit trail (what operations have been performed). pii_lineage() joins these into a comprehensive data flow view.

Related Articles