Docs For AI
Design

Data Integrity

Methods to verify and ensure data correctness, consistency, and reliability

Data Integrity Verification

Data integrity ensures that your database contains accurate, consistent, and reliable data. This guide covers methods to assess and enforce data integrity in your database design.

Integrity Dimensions

Data Integrity Framework:
┌─────────────────────────────────────────────────────────────┐
│                     DATA INTEGRITY                           │
├─────────────────┬─────────────────┬────────────────────────┤
│    Entity       │  Referential    │      Domain            │
│   Integrity     │   Integrity     │     Integrity          │
├─────────────────┼─────────────────┼────────────────────────┤
│ • Primary Keys  │ • Foreign Keys  │ • Data Types           │
│ • Uniqueness    │ • Cascades      │ • CHECK Constraints    │
│ • NOT NULL      │ • Orphan Check  │ • ENUM Values          │
│ • Identity      │ • Circular Refs │ • Range Limits         │
└─────────────────┴─────────────────┴────────────────────────┘

Entity Integrity Assessment

1. Primary Key Verification

Every table should have a properly defined primary key.

Check for Missing Primary Keys:

-- PostgreSQL: Find tables without primary keys
SELECT
    t.table_schema,
    t.table_name
FROM information_schema.tables t
WHERE t.table_schema = 'public'
AND t.table_type = 'BASE TABLE'
AND NOT EXISTS (
    SELECT 1
    FROM information_schema.table_constraints tc
    WHERE tc.table_schema = t.table_schema
    AND tc.table_name = t.table_name
    AND tc.constraint_type = 'PRIMARY KEY'
)
ORDER BY t.table_name;

MySQL Version:

-- MySQL: Find tables without primary keys
SELECT
    t.table_schema,
    t.table_name
FROM information_schema.tables t
WHERE t.table_schema = DATABASE()
AND t.table_type = 'BASE TABLE'
AND NOT EXISTS (
    SELECT 1
    FROM information_schema.table_constraints tc
    WHERE tc.table_schema = t.table_schema
    AND tc.table_name = t.table_name
    AND tc.constraint_type = 'PRIMARY KEY'
);

2. Uniqueness Constraint Coverage

Identify Potential Missing Unique Constraints:

-- Find columns that might need UNIQUE constraints
-- (columns with "email", "username", "code", etc. in name)
SELECT
    table_name,
    column_name,
    data_type,
    is_nullable
FROM information_schema.columns
WHERE table_schema = 'public'
AND (
    column_name LIKE '%email%'
    OR column_name LIKE '%username%'
    OR column_name LIKE '%code%'
    OR column_name LIKE '%slug%'
    OR column_name LIKE '%sku%'
)
AND column_name NOT IN (
    SELECT column_name
    FROM information_schema.table_constraints tc
    JOIN information_schema.key_column_usage kcu
        ON tc.constraint_name = kcu.constraint_name
    WHERE tc.constraint_type IN ('PRIMARY KEY', 'UNIQUE')
)
ORDER BY table_name;

Detect Actual Duplicate Data:

-- Template to find duplicate values in any column
SELECT
    column_name,
    COUNT(*) as occurrences
FROM your_table
GROUP BY column_name
HAVING COUNT(*) > 1
ORDER BY occurrences DESC;

3. NOT NULL Assessment

Find Nullable Columns That Shouldn't Be:

-- Columns that are nullable but probably shouldn't be
-- (based on naming patterns)
SELECT
    table_name,
    column_name,
    data_type
FROM information_schema.columns
WHERE table_schema = 'public'
AND is_nullable = 'YES'
AND (
    column_name LIKE '%_id'    -- Foreign keys
    OR column_name = 'created_at'
    OR column_name = 'updated_at'
    OR column_name = 'status'
    OR column_name = 'type'
)
ORDER BY table_name, column_name;

-- Check for NULL values in important columns
SELECT
    COUNT(*) FILTER (WHERE important_column IS NULL) as null_count,
    COUNT(*) as total_count,
    ROUND(100.0 * COUNT(*) FILTER (WHERE important_column IS NULL) / COUNT(*), 2) as null_percentage
FROM your_table;

Referential Integrity Assessment

1. Foreign Key Coverage

Find Tables Without Foreign Key Relationships:

-- Tables that might need foreign keys
SELECT
    t.table_name,
    c.column_name
FROM information_schema.tables t
JOIN information_schema.columns c
    ON t.table_name = c.table_name
WHERE t.table_schema = 'public'
AND t.table_type = 'BASE TABLE'
AND c.column_name LIKE '%_id'
AND c.column_name NOT IN (
    SELECT kcu.column_name
    FROM information_schema.table_constraints tc
    JOIN information_schema.key_column_usage kcu
        ON tc.constraint_name = kcu.constraint_name
    WHERE tc.table_name = t.table_name
    AND tc.constraint_type IN ('PRIMARY KEY', 'FOREIGN KEY')
)
ORDER BY t.table_name, c.column_name;

2. Orphan Record Detection

PostgreSQL - Find Orphan Records:

-- Template: Find records with invalid foreign key references
-- Replace table and column names as needed

-- Example: Find orders with non-existent customers
SELECT o.*
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.id
WHERE c.id IS NULL
AND o.customer_id IS NOT NULL;

-- Generic orphan finder (dynamic SQL)
DO $$
DECLARE
    fk RECORD;
    orphan_count INTEGER;
BEGIN
    FOR fk IN
        SELECT
            tc.table_name as child_table,
            kcu.column_name as child_column,
            ccu.table_name as parent_table,
            ccu.column_name as parent_column
        FROM information_schema.table_constraints tc
        JOIN information_schema.key_column_usage kcu
            ON tc.constraint_name = kcu.constraint_name
        JOIN information_schema.constraint_column_usage ccu
            ON tc.constraint_name = ccu.constraint_name
        WHERE tc.constraint_type = 'FOREIGN KEY'
        AND tc.table_schema = 'public'
    LOOP
        EXECUTE format('
            SELECT COUNT(*)
            FROM %I c
            LEFT JOIN %I p ON c.%I = p.%I
            WHERE p.%I IS NULL AND c.%I IS NOT NULL',
            fk.child_table, fk.parent_table,
            fk.child_column, fk.parent_column,
            fk.parent_column, fk.child_column
        ) INTO orphan_count;

        IF orphan_count > 0 THEN
            RAISE NOTICE 'Found % orphan records in %.% referencing %.%',
                orphan_count, fk.child_table, fk.child_column,
                fk.parent_table, fk.parent_column;
        END IF;
    END LOOP;
END $$;

3. Cascade Rule Review

Review Foreign Key Actions:

-- List all foreign keys with their ON DELETE/UPDATE actions
SELECT
    tc.table_name as child_table,
    kcu.column_name as fk_column,
    ccu.table_name as parent_table,
    rc.delete_rule,
    rc.update_rule
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
    ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage ccu
    ON tc.constraint_name = ccu.constraint_name
JOIN information_schema.referential_constraints rc
    ON tc.constraint_name = rc.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY'
AND tc.table_schema = 'public'
ORDER BY tc.table_name;

Cascade Rule Best Practices:

RelationshipRecommended ON DELETERecommended ON UPDATE
User → ProfileCASCADECASCADE
Order → OrderItemsCASCADECASCADE
User → OrdersRESTRICT or SET NULLCASCADE
Category → ProductsRESTRICTCASCADE
Soft-delete entitiesSET NULLCASCADE

Domain Integrity Assessment

1. Data Type Appropriateness

Check for Potential Data Type Issues:

-- Find oversized varchar columns (potential waste)
SELECT
    table_name,
    column_name,
    character_maximum_length
FROM information_schema.columns
WHERE table_schema = 'public'
AND data_type = 'character varying'
AND character_maximum_length > 1000
ORDER BY character_maximum_length DESC;

-- Find text columns that might be better as enum/varchar
SELECT
    table_name,
    column_name,
    data_type
FROM information_schema.columns
WHERE table_schema = 'public'
AND data_type = 'text'
AND column_name IN ('status', 'type', 'category', 'role', 'state');

2. CHECK Constraint Coverage

List Existing CHECK Constraints:

-- PostgreSQL: List all CHECK constraints
SELECT
    tc.table_name,
    tc.constraint_name,
    cc.check_clause
FROM information_schema.table_constraints tc
JOIN information_schema.check_constraints cc
    ON tc.constraint_name = cc.constraint_name
WHERE tc.constraint_type = 'CHECK'
AND tc.table_schema = 'public'
ORDER BY tc.table_name;

Columns That Should Have CHECK Constraints:

-- Find numeric columns without range constraints
SELECT
    table_name,
    column_name,
    data_type
FROM information_schema.columns
WHERE table_schema = 'public'
AND data_type IN ('integer', 'numeric', 'decimal', 'real')
AND column_name IN ('price', 'quantity', 'amount', 'age', 'rating', 'percentage')
ORDER BY table_name;

Recommended CHECK Constraints:

-- Example CHECK constraints for common scenarios
ALTER TABLE products
ADD CONSTRAINT chk_price_positive CHECK (price >= 0);

ALTER TABLE orders
ADD CONSTRAINT chk_quantity_positive CHECK (quantity > 0);

ALTER TABLE users
ADD CONSTRAINT chk_age_range CHECK (age >= 0 AND age <= 150);

ALTER TABLE reviews
ADD CONSTRAINT chk_rating_range CHECK (rating >= 1 AND rating <= 5);

ALTER TABLE discounts
ADD CONSTRAINT chk_percentage_range CHECK (percentage >= 0 AND percentage <= 100);

3. ENUM and Valid Values

Find Columns That Should Use ENUM:

-- Find columns with limited distinct values (ENUM candidates)
-- Run this for each table
SELECT
    column_name,
    COUNT(DISTINCT column_value) as distinct_count,
    array_agg(DISTINCT column_value) as values
FROM (
    SELECT 'status' as column_name, status as column_value FROM your_table
) sub
GROUP BY column_name
HAVING COUNT(DISTINCT column_value) < 10;

Integrity Audit Report

Comprehensive Integrity Check Script:

-- Data Integrity Audit Report
DO $$
DECLARE
    v_table RECORD;
    v_count INTEGER;
BEGIN
    RAISE NOTICE '╔══════════════════════════════════════════════════════════╗';
    RAISE NOTICE '║          DATA INTEGRITY AUDIT REPORT                     ║';
    RAISE NOTICE '╚══════════════════════════════════════════════════════════╝';
    RAISE NOTICE '';

    -- 1. Tables without Primary Keys
    RAISE NOTICE '1. ENTITY INTEGRITY - Primary Keys';
    RAISE NOTICE '───────────────────────────────────';
    SELECT COUNT(*) INTO v_count
    FROM information_schema.tables t
    WHERE t.table_schema = 'public'
    AND t.table_type = 'BASE TABLE'
    AND NOT EXISTS (
        SELECT 1 FROM information_schema.table_constraints tc
        WHERE tc.table_name = t.table_name
        AND tc.constraint_type = 'PRIMARY KEY'
    );
    RAISE NOTICE '   Tables without PK: %', v_count;
    IF v_count > 0 THEN
        FOR v_table IN
            SELECT t.table_name
            FROM information_schema.tables t
            WHERE t.table_schema = 'public'
            AND t.table_type = 'BASE TABLE'
            AND NOT EXISTS (
                SELECT 1 FROM information_schema.table_constraints tc
                WHERE tc.table_name = t.table_name
                AND tc.constraint_type = 'PRIMARY KEY'
            )
        LOOP
            RAISE NOTICE '     - %', v_table.table_name;
        END LOOP;
    END IF;

    -- 2. Foreign Key Coverage
    RAISE NOTICE '';
    RAISE NOTICE '2. REFERENTIAL INTEGRITY - Foreign Keys';
    RAISE NOTICE '────────────────────────────────────────';
    SELECT COUNT(*) INTO v_count
    FROM information_schema.table_constraints
    WHERE constraint_type = 'FOREIGN KEY'
    AND table_schema = 'public';
    RAISE NOTICE '   Total Foreign Keys: %', v_count;

    -- 3. CHECK Constraints
    RAISE NOTICE '';
    RAISE NOTICE '3. DOMAIN INTEGRITY - CHECK Constraints';
    RAISE NOTICE '────────────────────────────────────────';
    SELECT COUNT(*) INTO v_count
    FROM information_schema.table_constraints
    WHERE constraint_type = 'CHECK'
    AND table_schema = 'public';
    RAISE NOTICE '   Total CHECK Constraints: %', v_count;

    -- 4. NOT NULL Coverage
    RAISE NOTICE '';
    RAISE NOTICE '4. NULL VALUE ANALYSIS';
    RAISE NOTICE '──────────────────────';
    SELECT COUNT(*) INTO v_count
    FROM information_schema.columns
    WHERE table_schema = 'public'
    AND is_nullable = 'NO';
    RAISE NOTICE '   NOT NULL Columns: %', v_count;

    SELECT COUNT(*) INTO v_count
    FROM information_schema.columns
    WHERE table_schema = 'public'
    AND is_nullable = 'YES';
    RAISE NOTICE '   Nullable Columns: %', v_count;

    RAISE NOTICE '';
    RAISE NOTICE '══════════════════════════════════════════════════════════';
END $$;

Integrity Scoring Matrix

CheckPointsCriteria
All tables have PK5100% coverage
No orphan records5Zero orphans detected
FK constraints defined5All relationships covered
CHECK constraints5Range/value validations in place
NOT NULL appropriate5Required fields protected
Total25

Common Integrity Issues and Fixes

IssueDetectionFix
Missing PKQuery aboveAdd surrogate or natural key
Orphan recordsLEFT JOIN checkAdd FK constraint, clean data
Invalid valuesDomain queriesAdd CHECK constraint
NULL in required fieldIS NULL countAdd NOT NULL, fix data
Duplicate unique valuesGROUP BY HAVINGAdd UNIQUE constraint

On this page