Normalization Assessment
Methods to evaluate database normalization level and identify data anomalies
Normalization Assessment
Normalization is a systematic approach to organizing data to reduce redundancy and improve data integrity. This guide provides methods to assess your database's normalization level.
Normalization Levels Overview
Normalization Progression:
┌─────────────────────────────────────────────────────────────┐
│ │
│ Unnormalized ──► 1NF ──► 2NF ──► 3NF ──► BCNF ──► 4NF │
│ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │
│ Raw Data Atomic No No No No │
│ Values Partial Trans. Determ. Multi- │
│ Deps Deps Anom. valued │
│ Deps │
└─────────────────────────────────────────────────────────────┘Assessment Checklist
First Normal Form (1NF) Check
A table is in 1NF if:
- Each column contains only atomic (indivisible) values
- Each column contains values of a single type
- Each row is unique (has a primary key)
- No repeating groups
Detection Query - Find Non-Atomic Values:
-- Check for potential array/list values stored as strings
-- Look for comma-separated values in text columns
SELECT
table_name,
column_name,
data_type
FROM information_schema.columns
WHERE table_schema = 'public'
AND data_type IN ('text', 'varchar', 'character varying')
ORDER BY table_name, column_name;
-- Sample check for comma-separated values
SELECT column_name, COUNT(*) as suspicious_rows
FROM your_table
WHERE column_name LIKE '%,%'
GROUP BY column_name
HAVING COUNT(*) > 0;1NF Violation Examples:
-- Bad: Violates 1NF (multiple values in one column)
CREATE TABLE orders_bad (
order_id INT PRIMARY KEY,
customer_name VARCHAR(100),
products VARCHAR(500) -- "iPhone, MacBook, AirPods"
);
-- Good: Complies with 1NF
CREATE TABLE orders_good (
order_id INT,
customer_name VARCHAR(100),
product_name VARCHAR(100),
PRIMARY KEY (order_id, product_name)
);Second Normal Form (2NF) Check
A table is in 2NF if:
- It is in 1NF
- All non-key attributes are fully dependent on the entire primary key
- No partial dependencies on composite keys
Detection Method - Identify Partial Dependencies:
-- List tables with composite primary keys (potential 2NF issues)
SELECT
tc.table_name,
STRING_AGG(kcu.column_name, ', ') as pk_columns,
COUNT(*) as pk_column_count
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
WHERE tc.constraint_type = 'PRIMARY KEY'
AND tc.table_schema = 'public'
GROUP BY tc.table_name
HAVING COUNT(*) > 1
ORDER BY tc.table_name;2NF Violation Example:
-- Bad: Violates 2NF (student_name depends only on student_id)
CREATE TABLE enrollments_bad (
student_id INT,
course_id INT,
student_name VARCHAR(100), -- Partial dependency
student_email VARCHAR(100), -- Partial dependency
enrollment_date DATE,
grade CHAR(2),
PRIMARY KEY (student_id, course_id)
);
-- Good: Complies with 2NF
CREATE TABLE students (
student_id INT PRIMARY KEY,
student_name VARCHAR(100),
student_email VARCHAR(100)
);
CREATE TABLE enrollments_good (
student_id INT REFERENCES students(student_id),
course_id INT REFERENCES courses(course_id),
enrollment_date DATE,
grade CHAR(2),
PRIMARY KEY (student_id, course_id)
);Third Normal Form (3NF) Check
A table is in 3NF if:
- It is in 2NF
- No transitive dependencies (non-key attribute depending on another non-key attribute)
Detection Method - Find Transitive Dependencies:
Look for patterns where:
- Column A depends on the primary key
- Column B depends on Column A (not directly on PK)
-- Manual analysis: Look for potential transitive dependencies
-- Check for columns that might depend on other non-key columns
-- Example: If you have city and country in same table,
-- country likely depends on city (transitive dependency)
SELECT
table_name,
column_name
FROM information_schema.columns
WHERE table_schema = 'public'
AND column_name IN ('city', 'country', 'state', 'zip_code',
'department_name', 'department_head',
'category_name', 'category_description')
ORDER BY table_name;3NF Violation Example:
-- Bad: Violates 3NF (department_name depends on department_id)
CREATE TABLE employees_bad (
employee_id INT PRIMARY KEY,
employee_name VARCHAR(100),
department_id INT,
department_name VARCHAR(100), -- Transitive dependency
department_location VARCHAR(100) -- Transitive dependency
);
-- Good: Complies with 3NF
CREATE TABLE departments (
department_id INT PRIMARY KEY,
department_name VARCHAR(100),
department_location VARCHAR(100)
);
CREATE TABLE employees_good (
employee_id INT PRIMARY KEY,
employee_name VARCHAR(100),
department_id INT REFERENCES departments(department_id)
);Anomaly Detection
Update Anomaly Detection
Update anomalies occur when changing data requires multiple row updates.
-- Find potential update anomalies: columns with repeated values
-- that should probably be in a separate table
WITH column_stats AS (
SELECT
'your_table' as table_name,
'column_name' as column_name,
COUNT(*) as total_rows,
COUNT(DISTINCT column_name) as distinct_values
FROM your_table
)
SELECT
table_name,
column_name,
total_rows,
distinct_values,
ROUND(100.0 * distinct_values / total_rows, 2) as uniqueness_ratio
FROM column_stats
WHERE distinct_values < total_rows * 0.1 -- Less than 10% unique
AND total_rows > 100; -- Meaningful sample sizeExample of Update Anomaly:
-- Bad: Updating supplier address requires multiple updates
UPDATE products
SET supplier_address = 'New Address'
WHERE supplier_id = 123;
-- Must update ALL rows with this supplier
-- Good: Single update in normalized design
UPDATE suppliers
SET address = 'New Address'
WHERE supplier_id = 123;Insertion Anomaly Detection
Insertion anomalies occur when you cannot insert data without additional unrelated data.
-- Check for tables where NOT NULL constraints might cause insertion issues
SELECT
t.table_name,
c.column_name,
c.is_nullable,
c.column_default
FROM information_schema.tables t
JOIN information_schema.columns c ON t.table_name = c.table_name
WHERE t.table_schema = 'public'
AND c.is_nullable = 'NO'
AND c.column_default IS NULL
ORDER BY t.table_name, c.ordinal_position;Deletion Anomaly Detection
Deletion anomalies occur when deleting data unintentionally removes other data.
-- Find tables without foreign key relationships (potential deletion issues)
SELECT t.table_name
FROM information_schema.tables t
WHERE t.table_schema = 'public'
AND t.table_type = 'BASE TABLE'
AND NOT EXISTS (
SELECT 1 FROM information_schema.table_constraints tc
WHERE tc.table_name = t.table_name
AND tc.constraint_type = 'FOREIGN KEY'
)
AND NOT EXISTS (
SELECT 1 FROM information_schema.table_constraints tc
JOIN information_schema.referential_constraints rc
ON tc.constraint_name = rc.unique_constraint_name
WHERE rc.unique_constraint_name IN (
SELECT constraint_name
FROM information_schema.table_constraints
WHERE table_name = t.table_name
)
);Normalization Assessment Script
A comprehensive PostgreSQL script to assess normalization:
-- Comprehensive Normalization Assessment Report
DO $$
DECLARE
v_table RECORD;
v_result TEXT := '';
BEGIN
RAISE NOTICE '=== NORMALIZATION ASSESSMENT REPORT ===';
RAISE NOTICE '';
-- 1. Check for tables without primary keys (1NF issue)
RAISE NOTICE '1. Tables Without Primary Keys:';
FOR v_table IN
SELECT t.table_name
FROM information_schema.tables t
WHERE t.table_schema = 'public'
AND t.table_type = 'BASE TABLE'
AND NOT EXISTS (
SELECT 1 FROM information_schema.table_constraints tc
WHERE tc.table_name = t.table_name
AND tc.constraint_type = 'PRIMARY KEY'
)
LOOP
RAISE NOTICE ' - %', v_table.table_name;
END LOOP;
-- 2. Tables with composite PKs (potential 2NF issues)
RAISE NOTICE '';
RAISE NOTICE '2. Tables with Composite PKs (Review for 2NF):';
FOR v_table IN
SELECT tc.table_name, COUNT(*) as pk_cols
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
WHERE tc.constraint_type = 'PRIMARY KEY'
AND tc.table_schema = 'public'
GROUP BY tc.table_name
HAVING COUNT(*) > 1
LOOP
RAISE NOTICE ' - % (% columns)', v_table.table_name, v_table.pk_cols;
END LOOP;
-- 3. Tables without foreign keys (potential isolation)
RAISE NOTICE '';
RAISE NOTICE '3. Isolated Tables (No FK Relationships):';
FOR v_table IN
SELECT t.table_name
FROM information_schema.tables t
WHERE t.table_schema = 'public'
AND t.table_type = 'BASE TABLE'
AND NOT EXISTS (
SELECT 1 FROM information_schema.table_constraints tc
WHERE tc.table_name = t.table_name
AND tc.constraint_type = 'FOREIGN KEY'
)
LOOP
RAISE NOTICE ' - %', v_table.table_name;
END LOOP;
RAISE NOTICE '';
RAISE NOTICE '=== END OF REPORT ===';
END $$;Normalization Scoring Matrix
| Check | Points | Criteria |
|---|---|---|
| All tables have PK | 5 | Every table has a defined primary key |
| No multi-valued columns | 5 | No arrays/lists stored as delimited strings |
| No partial dependencies | 5 | Composite PK tables reviewed and clean |
| No transitive dependencies | 5 | Non-key columns don't depend on other non-key columns |
| No update anomalies | 5 | Data changes require single-row updates |
| Total | 25 |
When to Denormalize
Sometimes denormalization is appropriate:
Denormalization Decision Tree:
┌───────────────────┐
│ Query Performance│
│ Issues? │
└─────────┬─────────┘
│
┌─────────▼─────────┐
│ JOINs causing │
│ slowdowns? │
No └─────────┬─────────┘ Yes
│ │ │
▼ │ ▼
Keep Normal │ ┌─────────────────┐
│ │ Read-heavy or │
│ │ Write-heavy? │
│ └────────┬────────┘
│ │
│ ┌──────────┴──────────┐
│ │ │
│ Read-heavy Write-heavy
│ │ │
│ ▼ ▼
│ Consider Keep Normal
│ Denormalize Use Caching
│ │
│ ▼
└──► Add redundant columns
or summary tablesSafe Denormalization Patterns:
- Cached Aggregates: Store calculated totals that are expensive to compute
- Materialized Views: Pre-computed query results
- Read Replicas: Denormalized copies for reporting