Data modeling is an essential aspect of any data transformation process. Its primary purpose is to ensure that data is in a usable format and can be easily accessed and analyzed. The core of many data models often involves fact and dimension tables, common structures used in data warehousing and business intelligence applications.
DBT (Data Build Tool) is a powerful framework that allows data teams to transform raw data in their warehouses into trustworthy, understandable models. As your data model grows in complexity, the architecture of your DBT project becomes increasingly critical. Here, we explore strategies to design and structure your DBT models with complex fact and dimension tables, aiming for maintainability and adaptability as requirements change over time.
Understand Your Data Model
Before structuring your DBT project, it’s crucial to understand your data model. For complex data models, it typically involves numerous fact tables (which hold quantitative information) and dimension tables (which contain descriptive attributes). Knowing how these tables interrelate is the key to designing efficient and effective transformations.
Base Models
Start your DBT project by creating “base models” for each of your raw data tables. These base models act as the foundation for your project, performing essential cleaning operations like data type casting, null handling, and removing unwanted data.
-- base_orders.sql
{{
config(
materialized='view'
)
}}
SELECT
id,
user_id,
product_id,
amount,
order_date,
status
FROM {{ source('raw', 'orders') }}
Staging Models
Once you’ve created your base models, the next step is to construct “staging models.” Staging models take your base models and turn them into a cleaned, transformed version that matches your desired fact or dimension table’s structure.
For dimension tables, these models might involve creating derived fields, implementing business logic, or denormalizing data.
For fact tables, staging models will mostly involve renaming, reordering, and excluding columns, or adding derived columns.
-- stg_orders.sql
{{
config(
materialized='table'
)
}}
SELECT
id AS order_id,
user_id AS order_user_id,
product_id AS order_product_id,
amount AS order_amount,
order_date AS order_date,
CASE
WHEN status = 1 THEN 'pending'
WHEN status = 2 THEN 'shipped'
WHEN status = 3 THEN 'delivered'
ELSE 'unknown'
END AS order_status
FROM {{ ref('base_orders') }}
Fact and Dimension Models
After staging models, you create your final fact and dimension models. These models utilize the cleaned, prepared data from the staging models and transform them into the final structure.
In dimension models, you may perform additional transformations like scd (slowly changing dimensions) type 2 operations or more complex business logic.
Fact models often involve joining multiple staging models, typically fact tables with dimension tables, to provide context to your measures.
-- dim_users.sql
{{
config(
materialized='table'
)
}}
SELECT
id AS user_id,
first_name,
last_name,
email,
country
FROM {{ ref('stg_users') }}
-- fct_orders.sql
{{
config(
materialized='table'
)
}}
SELECT
o.order_id,
o.order_amount,
o.order_date,
o.order_status,
u.user_id,
u.first_name,
u.last_name,
u.email,
u.country
FROM {{ ref('stg_orders') }} o
JOIN {{ ref('dim_users') }} u ON o.order_user_id = u.user_id
models:
my_project:
base:
constraints:
not_null:
- id
unique:
- id
stg_orders:
constraints:
not_null:
- order_id
dim_users:
constraints:
not_null:
- user_id
unique:
- user_id
Get more useful articles on dbt