Modern Data Warehouse Architecture: Star Schema vs Data Vault
Choosing the right data warehouse modeling approach can make or break your analytics platform. The three main contenders — Star Schema, Snowflake Schema, and Data Vault — each have distinct tradeoffs.
Star Schema: Simple and Fast
Star schema is the classic approach: one fact table surrounded by dimension tables. It is optimized for query performance and is easy for business analysts to understand.
-- Fact table
CREATE TABLE fact_sales (
sale_id BIGINT,
date_key INT REFERENCES dim_date(date_key),
product_key INT REFERENCES dim_product(product_key),
customer_key INT REFERENCES dim_customer(customer_key),
quantity INT,
revenue DECIMAL(12,2)
);
-- Dimension table
CREATE TABLE dim_product (
product_key INT PRIMARY KEY,
product_name VARCHAR(200),
category VARCHAR(100),
brand VARCHAR(100)
);
Data Vault: Auditable and Scalable
Data Vault separates business keys (Hubs), relationships (Links), and descriptive data (Satellites). It excels at auditability, historical tracking, and handling multiple source systems.
-- Hub: core business entity
CREATE TABLE hub_customer (
hub_customer_key BIGINT PRIMARY KEY,
customer_id VARCHAR(50) NOT NULL, -- business key
load_date TIMESTAMP,
record_source VARCHAR(100)
);
-- Satellite: historical attributes
CREATE TABLE sat_customer_details (
hub_customer_key BIGINT,
load_date TIMESTAMP,
name VARCHAR(200),
email VARCHAR(200),
PRIMARY KEY (hub_customer_key, load_date)
);
When to Use Each
- Star Schema: Single source of truth, stable business rules, self-service BI priority
- Snowflake Schema: Large dimensions with many attributes, storage is a concern
- Data Vault: Multiple source systems, strict audit requirements, agile development
💡 Modern approach: Many teams use Data Vault for the raw/staging layer and star schema for the presentation layer (business mart). Best of both worlds.