Docs For AI
Design

Normalization Assessment

Methods to evaluate database normalization level and identify data anomalies

Normalization Assessment

Normalization is a systematic approach to organizing data to reduce redundancy and improve data integrity. This guide provides methods to assess your database's normalization level.

Normalization Levels Overview

Normalization Progression:
┌─────────────────────────────────────────────────────────────┐
│                                                             │
│  Unnormalized ──► 1NF ──► 2NF ──► 3NF ──► BCNF ──► 4NF     │
│       │           │       │       │        │        │       │
│       │           │       │       │        │        │       │
│   Raw Data    Atomic   No      No     No      No           │
│               Values  Partial Trans. Determ. Multi-        │
│                       Deps    Deps   Anom.   valued        │
│                                              Deps          │
└─────────────────────────────────────────────────────────────┘

Assessment Checklist

First Normal Form (1NF) Check

A table is in 1NF if:

  • Each column contains only atomic (indivisible) values
  • Each column contains values of a single type
  • Each row is unique (has a primary key)
  • No repeating groups

Detection Query - Find Non-Atomic Values:

-- Check for potential array/list values stored as strings
-- Look for comma-separated values in text columns
SELECT
    table_name,
    column_name,
    data_type
FROM information_schema.columns
WHERE table_schema = 'public'
AND data_type IN ('text', 'varchar', 'character varying')
ORDER BY table_name, column_name;

-- Sample check for comma-separated values
SELECT column_name, COUNT(*) as suspicious_rows
FROM your_table
WHERE column_name LIKE '%,%'
GROUP BY column_name
HAVING COUNT(*) > 0;

1NF Violation Examples:

-- Bad: Violates 1NF (multiple values in one column)
CREATE TABLE orders_bad (
    order_id INT PRIMARY KEY,
    customer_name VARCHAR(100),
    products VARCHAR(500)  -- "iPhone, MacBook, AirPods"
);

-- Good: Complies with 1NF
CREATE TABLE orders_good (
    order_id INT,
    customer_name VARCHAR(100),
    product_name VARCHAR(100),
    PRIMARY KEY (order_id, product_name)
);

Second Normal Form (2NF) Check

A table is in 2NF if:

  • It is in 1NF
  • All non-key attributes are fully dependent on the entire primary key
  • No partial dependencies on composite keys

Detection Method - Identify Partial Dependencies:

-- List tables with composite primary keys (potential 2NF issues)
SELECT
    tc.table_name,
    STRING_AGG(kcu.column_name, ', ') as pk_columns,
    COUNT(*) as pk_column_count
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
    ON tc.constraint_name = kcu.constraint_name
WHERE tc.constraint_type = 'PRIMARY KEY'
AND tc.table_schema = 'public'
GROUP BY tc.table_name
HAVING COUNT(*) > 1
ORDER BY tc.table_name;

2NF Violation Example:

-- Bad: Violates 2NF (student_name depends only on student_id)
CREATE TABLE enrollments_bad (
    student_id INT,
    course_id INT,
    student_name VARCHAR(100),    -- Partial dependency
    student_email VARCHAR(100),   -- Partial dependency
    enrollment_date DATE,
    grade CHAR(2),
    PRIMARY KEY (student_id, course_id)
);

-- Good: Complies with 2NF
CREATE TABLE students (
    student_id INT PRIMARY KEY,
    student_name VARCHAR(100),
    student_email VARCHAR(100)
);

CREATE TABLE enrollments_good (
    student_id INT REFERENCES students(student_id),
    course_id INT REFERENCES courses(course_id),
    enrollment_date DATE,
    grade CHAR(2),
    PRIMARY KEY (student_id, course_id)
);

Third Normal Form (3NF) Check

A table is in 3NF if:

  • It is in 2NF
  • No transitive dependencies (non-key attribute depending on another non-key attribute)

Detection Method - Find Transitive Dependencies:

Look for patterns where:

  • Column A depends on the primary key
  • Column B depends on Column A (not directly on PK)
-- Manual analysis: Look for potential transitive dependencies
-- Check for columns that might depend on other non-key columns

-- Example: If you have city and country in same table,
-- country likely depends on city (transitive dependency)
SELECT
    table_name,
    column_name
FROM information_schema.columns
WHERE table_schema = 'public'
AND column_name IN ('city', 'country', 'state', 'zip_code',
                    'department_name', 'department_head',
                    'category_name', 'category_description')
ORDER BY table_name;

3NF Violation Example:

-- Bad: Violates 3NF (department_name depends on department_id)
CREATE TABLE employees_bad (
    employee_id INT PRIMARY KEY,
    employee_name VARCHAR(100),
    department_id INT,
    department_name VARCHAR(100),  -- Transitive dependency
    department_location VARCHAR(100)  -- Transitive dependency
);

-- Good: Complies with 3NF
CREATE TABLE departments (
    department_id INT PRIMARY KEY,
    department_name VARCHAR(100),
    department_location VARCHAR(100)
);

CREATE TABLE employees_good (
    employee_id INT PRIMARY KEY,
    employee_name VARCHAR(100),
    department_id INT REFERENCES departments(department_id)
);

Anomaly Detection

Update Anomaly Detection

Update anomalies occur when changing data requires multiple row updates.

-- Find potential update anomalies: columns with repeated values
-- that should probably be in a separate table

WITH column_stats AS (
    SELECT
        'your_table' as table_name,
        'column_name' as column_name,
        COUNT(*) as total_rows,
        COUNT(DISTINCT column_name) as distinct_values
    FROM your_table
)
SELECT
    table_name,
    column_name,
    total_rows,
    distinct_values,
    ROUND(100.0 * distinct_values / total_rows, 2) as uniqueness_ratio
FROM column_stats
WHERE distinct_values < total_rows * 0.1  -- Less than 10% unique
AND total_rows > 100;  -- Meaningful sample size

Example of Update Anomaly:

-- Bad: Updating supplier address requires multiple updates
UPDATE products
SET supplier_address = 'New Address'
WHERE supplier_id = 123;
-- Must update ALL rows with this supplier

-- Good: Single update in normalized design
UPDATE suppliers
SET address = 'New Address'
WHERE supplier_id = 123;

Insertion Anomaly Detection

Insertion anomalies occur when you cannot insert data without additional unrelated data.

-- Check for tables where NOT NULL constraints might cause insertion issues
SELECT
    t.table_name,
    c.column_name,
    c.is_nullable,
    c.column_default
FROM information_schema.tables t
JOIN information_schema.columns c ON t.table_name = c.table_name
WHERE t.table_schema = 'public'
AND c.is_nullable = 'NO'
AND c.column_default IS NULL
ORDER BY t.table_name, c.ordinal_position;

Deletion Anomaly Detection

Deletion anomalies occur when deleting data unintentionally removes other data.

-- Find tables without foreign key relationships (potential deletion issues)
SELECT t.table_name
FROM information_schema.tables t
WHERE t.table_schema = 'public'
AND t.table_type = 'BASE TABLE'
AND NOT EXISTS (
    SELECT 1 FROM information_schema.table_constraints tc
    WHERE tc.table_name = t.table_name
    AND tc.constraint_type = 'FOREIGN KEY'
)
AND NOT EXISTS (
    SELECT 1 FROM information_schema.table_constraints tc
    JOIN information_schema.referential_constraints rc
        ON tc.constraint_name = rc.unique_constraint_name
    WHERE rc.unique_constraint_name IN (
        SELECT constraint_name
        FROM information_schema.table_constraints
        WHERE table_name = t.table_name
    )
);

Normalization Assessment Script

A comprehensive PostgreSQL script to assess normalization:

-- Comprehensive Normalization Assessment Report
DO $$
DECLARE
    v_table RECORD;
    v_result TEXT := '';
BEGIN
    RAISE NOTICE '=== NORMALIZATION ASSESSMENT REPORT ===';
    RAISE NOTICE '';

    -- 1. Check for tables without primary keys (1NF issue)
    RAISE NOTICE '1. Tables Without Primary Keys:';
    FOR v_table IN
        SELECT t.table_name
        FROM information_schema.tables t
        WHERE t.table_schema = 'public'
        AND t.table_type = 'BASE TABLE'
        AND NOT EXISTS (
            SELECT 1 FROM information_schema.table_constraints tc
            WHERE tc.table_name = t.table_name
            AND tc.constraint_type = 'PRIMARY KEY'
        )
    LOOP
        RAISE NOTICE '   - %', v_table.table_name;
    END LOOP;

    -- 2. Tables with composite PKs (potential 2NF issues)
    RAISE NOTICE '';
    RAISE NOTICE '2. Tables with Composite PKs (Review for 2NF):';
    FOR v_table IN
        SELECT tc.table_name, COUNT(*) as pk_cols
        FROM information_schema.table_constraints tc
        JOIN information_schema.key_column_usage kcu
            ON tc.constraint_name = kcu.constraint_name
        WHERE tc.constraint_type = 'PRIMARY KEY'
        AND tc.table_schema = 'public'
        GROUP BY tc.table_name
        HAVING COUNT(*) > 1
    LOOP
        RAISE NOTICE '   - % (% columns)', v_table.table_name, v_table.pk_cols;
    END LOOP;

    -- 3. Tables without foreign keys (potential isolation)
    RAISE NOTICE '';
    RAISE NOTICE '3. Isolated Tables (No FK Relationships):';
    FOR v_table IN
        SELECT t.table_name
        FROM information_schema.tables t
        WHERE t.table_schema = 'public'
        AND t.table_type = 'BASE TABLE'
        AND NOT EXISTS (
            SELECT 1 FROM information_schema.table_constraints tc
            WHERE tc.table_name = t.table_name
            AND tc.constraint_type = 'FOREIGN KEY'
        )
    LOOP
        RAISE NOTICE '   - %', v_table.table_name;
    END LOOP;

    RAISE NOTICE '';
    RAISE NOTICE '=== END OF REPORT ===';
END $$;

Normalization Scoring Matrix

CheckPointsCriteria
All tables have PK5Every table has a defined primary key
No multi-valued columns5No arrays/lists stored as delimited strings
No partial dependencies5Composite PK tables reviewed and clean
No transitive dependencies5Non-key columns don't depend on other non-key columns
No update anomalies5Data changes require single-row updates
Total25

When to Denormalize

Sometimes denormalization is appropriate:

Denormalization Decision Tree:
                    ┌───────────────────┐
                    │  Query Performance│
                    │     Issues?       │
                    └─────────┬─────────┘

                    ┌─────────▼─────────┐
                    │  JOINs causing    │
                    │    slowdowns?     │
              No    └─────────┬─────────┘    Yes
              │               │               │
              ▼               │               ▼
        Keep Normal           │      ┌─────────────────┐
                              │      │ Read-heavy or   │
                              │      │ Write-heavy?    │
                              │      └────────┬────────┘
                              │               │
                              │    ┌──────────┴──────────┐
                              │    │                     │
                              │ Read-heavy          Write-heavy
                              │    │                     │
                              │    ▼                     ▼
                              │ Consider           Keep Normal
                              │ Denormalize        Use Caching
                              │    │
                              │    ▼
                              └──► Add redundant columns
                                   or summary tables

Safe Denormalization Patterns:

  1. Cached Aggregates: Store calculated totals that are expensive to compute
  2. Materialized Views: Pre-computed query results
  3. Read Replicas: Denormalized copies for reporting

On this page