DBT : DBTs way of handling testing and validation of data models ?

getDbt

DBT uses a testing framework called “Snapshot Testing” which allows to take snapshots of the data in the database tables and compare the data to the expected values. The testing is performed by writing “snapshot tests” in YAML format. The snapshot tests can be defined for specific models, tables, and even for specific columns within tables. These tests are defined in YAML files in the snapshot-tests directory of the DBT project.

For example, you can create a test for a specific model like this:

  - test_name: 'freshers_daily_view_model_test'
    model: 'freshers_daily_view_model'
    metric: 'freshers_daily_view_metric'
    snapshot: 'freshers_daily_view_snapshot'
    column_blacklist:
        - 'updated_at'
    column_whitelist:
        - 'id'
        - 'name'
        - 'description'

This test will check the data in the <code class="language-yaml">freshers_daily_view_model table, and compare it to the data in freshers_daily_view_snapshot table. The test will check all columns except for the updated_at column, and only check the id, name and description columns.

When you run the dbt test command, DBT will execute the tests and output the results in the terminal. If the test fails, it will show the differences between the actual data and the expected data.

In addition to snapshot testing, DBT also provides a facility to validate the data models using custom SQL queries. This feature is called “assertions”. Assertions are written in SQL and are defined in the dbt_project.yml file, and are executed when you run the dbt assert command.

For example, you can add an assertion to check that the total number of records in a specific table is greater than zero:

assertions:
  - test_name: 'freshers_in_daily_view_table_has_data'
    query: 'SELECT COUNT(*) FROM {{source_table}}'
    pass_condition: '>0'

This assertion will run the query SELECT COUNT(*) FROM {{source_table}} and check that the result is greater than zero. If the result is less than or equal to zero, the assertion will fail.

In summary, DBT provides a built-in support for testing and validation of data models through Snapshot testing and Assertions. Snapshot testing allows to take a snapshot of the data in the database tables and compare the data to the expected values. Assertions allows to validate the data models using custom SQL queries.

Get more useful articles on dbt

  1. ,
Author: user

Leave a Reply