Data Integrity
Methods to verify and ensure data correctness, consistency, and reliability
Data Integrity Verification
Data integrity ensures that your database contains accurate, consistent, and reliable data. This guide covers methods to assess and enforce data integrity in your database design.
Integrity Dimensions
Data Integrity Framework:
┌─────────────────────────────────────────────────────────────┐
│ DATA INTEGRITY │
├─────────────────┬─────────────────┬────────────────────────┤
│ Entity │ Referential │ Domain │
│ Integrity │ Integrity │ Integrity │
├─────────────────┼─────────────────┼────────────────────────┤
│ • Primary Keys │ • Foreign Keys │ • Data Types │
│ • Uniqueness │ • Cascades │ • CHECK Constraints │
│ • NOT NULL │ • Orphan Check │ • ENUM Values │
│ • Identity │ • Circular Refs │ • Range Limits │
└─────────────────┴─────────────────┴────────────────────────┘Entity Integrity Assessment
1. Primary Key Verification
Every table should have a properly defined primary key.
Check for Missing Primary Keys:
-- PostgreSQL: Find tables without primary keys
SELECT
t.table_schema,
t.table_name
FROM information_schema.tables t
WHERE t.table_schema = 'public'
AND t.table_type = 'BASE TABLE'
AND NOT EXISTS (
SELECT 1
FROM information_schema.table_constraints tc
WHERE tc.table_schema = t.table_schema
AND tc.table_name = t.table_name
AND tc.constraint_type = 'PRIMARY KEY'
)
ORDER BY t.table_name;MySQL Version:
-- MySQL: Find tables without primary keys
SELECT
t.table_schema,
t.table_name
FROM information_schema.tables t
WHERE t.table_schema = DATABASE()
AND t.table_type = 'BASE TABLE'
AND NOT EXISTS (
SELECT 1
FROM information_schema.table_constraints tc
WHERE tc.table_schema = t.table_schema
AND tc.table_name = t.table_name
AND tc.constraint_type = 'PRIMARY KEY'
);2. Uniqueness Constraint Coverage
Identify Potential Missing Unique Constraints:
-- Find columns that might need UNIQUE constraints
-- (columns with "email", "username", "code", etc. in name)
SELECT
table_name,
column_name,
data_type,
is_nullable
FROM information_schema.columns
WHERE table_schema = 'public'
AND (
column_name LIKE '%email%'
OR column_name LIKE '%username%'
OR column_name LIKE '%code%'
OR column_name LIKE '%slug%'
OR column_name LIKE '%sku%'
)
AND column_name NOT IN (
SELECT column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
WHERE tc.constraint_type IN ('PRIMARY KEY', 'UNIQUE')
)
ORDER BY table_name;Detect Actual Duplicate Data:
-- Template to find duplicate values in any column
SELECT
column_name,
COUNT(*) as occurrences
FROM your_table
GROUP BY column_name
HAVING COUNT(*) > 1
ORDER BY occurrences DESC;3. NOT NULL Assessment
Find Nullable Columns That Shouldn't Be:
-- Columns that are nullable but probably shouldn't be
-- (based on naming patterns)
SELECT
table_name,
column_name,
data_type
FROM information_schema.columns
WHERE table_schema = 'public'
AND is_nullable = 'YES'
AND (
column_name LIKE '%_id' -- Foreign keys
OR column_name = 'created_at'
OR column_name = 'updated_at'
OR column_name = 'status'
OR column_name = 'type'
)
ORDER BY table_name, column_name;
-- Check for NULL values in important columns
SELECT
COUNT(*) FILTER (WHERE important_column IS NULL) as null_count,
COUNT(*) as total_count,
ROUND(100.0 * COUNT(*) FILTER (WHERE important_column IS NULL) / COUNT(*), 2) as null_percentage
FROM your_table;Referential Integrity Assessment
1. Foreign Key Coverage
Find Tables Without Foreign Key Relationships:
-- Tables that might need foreign keys
SELECT
t.table_name,
c.column_name
FROM information_schema.tables t
JOIN information_schema.columns c
ON t.table_name = c.table_name
WHERE t.table_schema = 'public'
AND t.table_type = 'BASE TABLE'
AND c.column_name LIKE '%_id'
AND c.column_name NOT IN (
SELECT kcu.column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
WHERE tc.table_name = t.table_name
AND tc.constraint_type IN ('PRIMARY KEY', 'FOREIGN KEY')
)
ORDER BY t.table_name, c.column_name;2. Orphan Record Detection
PostgreSQL - Find Orphan Records:
-- Template: Find records with invalid foreign key references
-- Replace table and column names as needed
-- Example: Find orders with non-existent customers
SELECT o.*
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.id
WHERE c.id IS NULL
AND o.customer_id IS NOT NULL;
-- Generic orphan finder (dynamic SQL)
DO $$
DECLARE
fk RECORD;
orphan_count INTEGER;
BEGIN
FOR fk IN
SELECT
tc.table_name as child_table,
kcu.column_name as child_column,
ccu.table_name as parent_table,
ccu.column_name as parent_column
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage ccu
ON tc.constraint_name = ccu.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY'
AND tc.table_schema = 'public'
LOOP
EXECUTE format('
SELECT COUNT(*)
FROM %I c
LEFT JOIN %I p ON c.%I = p.%I
WHERE p.%I IS NULL AND c.%I IS NOT NULL',
fk.child_table, fk.parent_table,
fk.child_column, fk.parent_column,
fk.parent_column, fk.child_column
) INTO orphan_count;
IF orphan_count > 0 THEN
RAISE NOTICE 'Found % orphan records in %.% referencing %.%',
orphan_count, fk.child_table, fk.child_column,
fk.parent_table, fk.parent_column;
END IF;
END LOOP;
END $$;3. Cascade Rule Review
Review Foreign Key Actions:
-- List all foreign keys with their ON DELETE/UPDATE actions
SELECT
tc.table_name as child_table,
kcu.column_name as fk_column,
ccu.table_name as parent_table,
rc.delete_rule,
rc.update_rule
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage ccu
ON tc.constraint_name = ccu.constraint_name
JOIN information_schema.referential_constraints rc
ON tc.constraint_name = rc.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY'
AND tc.table_schema = 'public'
ORDER BY tc.table_name;Cascade Rule Best Practices:
| Relationship | Recommended ON DELETE | Recommended ON UPDATE |
|---|---|---|
| User → Profile | CASCADE | CASCADE |
| Order → OrderItems | CASCADE | CASCADE |
| User → Orders | RESTRICT or SET NULL | CASCADE |
| Category → Products | RESTRICT | CASCADE |
| Soft-delete entities | SET NULL | CASCADE |
Domain Integrity Assessment
1. Data Type Appropriateness
Check for Potential Data Type Issues:
-- Find oversized varchar columns (potential waste)
SELECT
table_name,
column_name,
character_maximum_length
FROM information_schema.columns
WHERE table_schema = 'public'
AND data_type = 'character varying'
AND character_maximum_length > 1000
ORDER BY character_maximum_length DESC;
-- Find text columns that might be better as enum/varchar
SELECT
table_name,
column_name,
data_type
FROM information_schema.columns
WHERE table_schema = 'public'
AND data_type = 'text'
AND column_name IN ('status', 'type', 'category', 'role', 'state');2. CHECK Constraint Coverage
List Existing CHECK Constraints:
-- PostgreSQL: List all CHECK constraints
SELECT
tc.table_name,
tc.constraint_name,
cc.check_clause
FROM information_schema.table_constraints tc
JOIN information_schema.check_constraints cc
ON tc.constraint_name = cc.constraint_name
WHERE tc.constraint_type = 'CHECK'
AND tc.table_schema = 'public'
ORDER BY tc.table_name;Columns That Should Have CHECK Constraints:
-- Find numeric columns without range constraints
SELECT
table_name,
column_name,
data_type
FROM information_schema.columns
WHERE table_schema = 'public'
AND data_type IN ('integer', 'numeric', 'decimal', 'real')
AND column_name IN ('price', 'quantity', 'amount', 'age', 'rating', 'percentage')
ORDER BY table_name;Recommended CHECK Constraints:
-- Example CHECK constraints for common scenarios
ALTER TABLE products
ADD CONSTRAINT chk_price_positive CHECK (price >= 0);
ALTER TABLE orders
ADD CONSTRAINT chk_quantity_positive CHECK (quantity > 0);
ALTER TABLE users
ADD CONSTRAINT chk_age_range CHECK (age >= 0 AND age <= 150);
ALTER TABLE reviews
ADD CONSTRAINT chk_rating_range CHECK (rating >= 1 AND rating <= 5);
ALTER TABLE discounts
ADD CONSTRAINT chk_percentage_range CHECK (percentage >= 0 AND percentage <= 100);3. ENUM and Valid Values
Find Columns That Should Use ENUM:
-- Find columns with limited distinct values (ENUM candidates)
-- Run this for each table
SELECT
column_name,
COUNT(DISTINCT column_value) as distinct_count,
array_agg(DISTINCT column_value) as values
FROM (
SELECT 'status' as column_name, status as column_value FROM your_table
) sub
GROUP BY column_name
HAVING COUNT(DISTINCT column_value) < 10;Integrity Audit Report
Comprehensive Integrity Check Script:
-- Data Integrity Audit Report
DO $$
DECLARE
v_table RECORD;
v_count INTEGER;
BEGIN
RAISE NOTICE '╔══════════════════════════════════════════════════════════╗';
RAISE NOTICE '║ DATA INTEGRITY AUDIT REPORT ║';
RAISE NOTICE '╚══════════════════════════════════════════════════════════╝';
RAISE NOTICE '';
-- 1. Tables without Primary Keys
RAISE NOTICE '1. ENTITY INTEGRITY - Primary Keys';
RAISE NOTICE '───────────────────────────────────';
SELECT COUNT(*) INTO v_count
FROM information_schema.tables t
WHERE t.table_schema = 'public'
AND t.table_type = 'BASE TABLE'
AND NOT EXISTS (
SELECT 1 FROM information_schema.table_constraints tc
WHERE tc.table_name = t.table_name
AND tc.constraint_type = 'PRIMARY KEY'
);
RAISE NOTICE ' Tables without PK: %', v_count;
IF v_count > 0 THEN
FOR v_table IN
SELECT t.table_name
FROM information_schema.tables t
WHERE t.table_schema = 'public'
AND t.table_type = 'BASE TABLE'
AND NOT EXISTS (
SELECT 1 FROM information_schema.table_constraints tc
WHERE tc.table_name = t.table_name
AND tc.constraint_type = 'PRIMARY KEY'
)
LOOP
RAISE NOTICE ' - %', v_table.table_name;
END LOOP;
END IF;
-- 2. Foreign Key Coverage
RAISE NOTICE '';
RAISE NOTICE '2. REFERENTIAL INTEGRITY - Foreign Keys';
RAISE NOTICE '────────────────────────────────────────';
SELECT COUNT(*) INTO v_count
FROM information_schema.table_constraints
WHERE constraint_type = 'FOREIGN KEY'
AND table_schema = 'public';
RAISE NOTICE ' Total Foreign Keys: %', v_count;
-- 3. CHECK Constraints
RAISE NOTICE '';
RAISE NOTICE '3. DOMAIN INTEGRITY - CHECK Constraints';
RAISE NOTICE '────────────────────────────────────────';
SELECT COUNT(*) INTO v_count
FROM information_schema.table_constraints
WHERE constraint_type = 'CHECK'
AND table_schema = 'public';
RAISE NOTICE ' Total CHECK Constraints: %', v_count;
-- 4. NOT NULL Coverage
RAISE NOTICE '';
RAISE NOTICE '4. NULL VALUE ANALYSIS';
RAISE NOTICE '──────────────────────';
SELECT COUNT(*) INTO v_count
FROM information_schema.columns
WHERE table_schema = 'public'
AND is_nullable = 'NO';
RAISE NOTICE ' NOT NULL Columns: %', v_count;
SELECT COUNT(*) INTO v_count
FROM information_schema.columns
WHERE table_schema = 'public'
AND is_nullable = 'YES';
RAISE NOTICE ' Nullable Columns: %', v_count;
RAISE NOTICE '';
RAISE NOTICE '══════════════════════════════════════════════════════════';
END $$;Integrity Scoring Matrix
| Check | Points | Criteria |
|---|---|---|
| All tables have PK | 5 | 100% coverage |
| No orphan records | 5 | Zero orphans detected |
| FK constraints defined | 5 | All relationships covered |
| CHECK constraints | 5 | Range/value validations in place |
| NOT NULL appropriate | 5 | Required fields protected |
| Total | 25 |
Common Integrity Issues and Fixes
| Issue | Detection | Fix |
|---|---|---|
| Missing PK | Query above | Add surrogate or natural key |
| Orphan records | LEFT JOIN check | Add FK constraint, clean data |
| Invalid values | Domain queries | Add CHECK constraint |
| NULL in required field | IS NULL count | Add NOT NULL, fix data |
| Duplicate unique values | GROUP BY HAVING | Add UNIQUE constraint |