DBT : Managing dependencies between DBT models. Explain the use of the ref() function?

getDbt

Managing dependencies between DBT models is essential for ensuring the correct order of execution and maintaining data consistency. The ref() function in DBT is used to manage these dependencies by creating references to other models within a project. When you use ref(), DBT automatically builds a Directed Acyclic Graph (DAG) to determine the correct execution order based on the dependencies between models.

The ref() function takes a single argument, the name of the model you want to reference. When DBT compiles your project, it replaces the ref() function with the appropriate table or view name in the destination schema, ensuring that your models are built on top of each other in the correct order.

Detailed Example:

Let’s consider an example of an e-commerce platform where you have two raw tables with the following data:

  1. freshers_in_orders: Contains order data such as order_id, user_id, product_id, order_date, and revenue.
  2. freshers_in_users: Contains user data such as user_id, name, email, signup_date, and country.

You want to create two intermediate models that transform the raw data:

  1. freshers_in_orders_daily: Aggregates the total revenue per day from the freshers_in_orders table.
  2. freshers_in_users_country: Groups users by country from the freshers_in_users table.

Finally, you want to create a final model freshers_in_revenue_by_country_daily that joins the two intermediate models to calculate the daily revenue per country.

Step 1: Create the freshers_in_orders_daily model:

-- freshers_in_orders_daily.sql
{{
    config(
        materialized='table'
    )
}}

SELECT
    order_date,
    SUM(revenue) as daily_revenue
FROM {{ ref('freshers_in_orders') }}
GROUP BY order_date

Step 2: Create the freshers_in_users_country model:

-- freshers_in_users_country.sql
{{
    config(
        materialized='table'
    )
}}

SELECT
    user_id,
    country
FROM {{ ref('freshers_in_users') }}

Step 3: Create the freshers_in_revenue_by_country_daily model:

-- freshers_in_revenue_by_country_daily.sql
{{
    config(
        materialized='table'
    )
}}

SELECT
    o.order_date,
    u.country,
    SUM(o.daily_revenue) as revenue
FROM {{ ref('freshers_in_orders_daily') }} o
JOIN {{ ref('freshers_in_users_country') }} u ON o.user_id = u.user_id
GROUP BY o.order_date, u.country

In this example, the ref() function is used to reference other models in the project. When DBT runs, it will identify the dependencies between models using the ref() function and execute them in the correct order:

  1. freshers_in_orders_daily depends on freshers_in_orders.
  2. freshers_in_users_country depends on freshers_in_users.
  3. freshers_in_revenue_by_country_daily depends on freshers_in_orders_daily and freshers_in_users_country.

DBT will ensure that freshers_in_orders_daily and freshers_in_users_country are executed before freshers_in_revenue_by_country_daily. This way, dependencies between models are managed, and data consistency is maintained throughout the transformation pipeline.

Get more useful articles on dbt

  1. ,
Author: user

Leave a Reply