DBT : Using DBT (Data Build Tool) for Testing and Capturing Test Results

user July 22, 2023 Leave a Comment

One of the great features of DBT is its testing framework. DBT allows us to validate the correctness and reliability of the transformations, reducing the risk of data issues going unnoticed.

DBT Test Command

DBT includes out-of-the-box support for several types of data tests, such as unique, not_null, accepted_values, relationships, etc. However, it also allows you to define your own custom data tests.

You run your tests in DBT using the dbt test command.

dbt test

This will execute all defined tests in the DBT project.

Capturing Test Results

DBT does not have a built-in feature to capture test results in a database or output file. However, DBT does return JSON output of the test results which you can capture and store.

You can use the –output option with dbt test to output the test results to a JSON file:

dbt test --output results.json

Then, you can parse this JSON file and store it in your database or use it however you need.

Example

Let’s assume we have a DBT project with a model called orders in our models directory. We want to ensure that the order_id column is unique and not null.

In our tests directory, we create a schema.yml file with the following content:

version: 2

models:
  - name: orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null

Now, we can run our tests:

dbt test

DBT will provide output on the console indicating which tests passed and which failed. If we want to capture these results, we can output it to a JSON file:

dbt test --output test_results.json

The test_results.json file will contain detailed information about each test, such as the test name, model, status (pass/fail), failure count, total rows, and more.

Sample JSON result:

{
    "metadata": {...},
    "results": [
        {
            "status": "pass",
            "message": null,
            "execution_time": 0.06402707099914551,
            "fail": null,
            "warn": null,
            "error": null,
            "skip": false,
            "test_metadata": {
                "name": "unique_order_id",
                "kwargs": {...},
                "total_rows": 1000,
                "passed_rows": 1000
            }
        },
        {
            "status": "pass",
            "message": null,
            "execution_time": 0.06440496444702148,
            "fail": null,
            "warn": null,
            "error": null,
            "skip": false,
            "test_metadata": {
                "name": "not_null_order_id",
                "kwargs": {...},
                "total_rows": 1000,
                "passed_rows": 1000
            }
        },
    ]
}

By capturing test results this way, we have the flexibility to parse and store them in a database, send them to a monitoring system, or do anything else we need. This is a great way to maintain a record of data quality over time and enhance the reliability of your data pipeline.

Get more useful articles on dbt

Post Views: 47

Author: user

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget

How PARTITION BY Works in Snowflake, and SQL in general

Stash a specific file using Git

Prevent your computer from locking : Python to simulate mouse movements

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Most Viewed Posts