One of the great features of DBT is its testing framework. DBT allows us to validate the correctness and reliability of the transformations, reducing the risk of data issues going unnoticed.
DBT Test Command
DBT includes out-of-the-box support for several types of data tests, such as unique, not_null, accepted_values, relationships, etc. However, it also allows you to define your own custom data tests.
You run your tests in DBT using the dbt test command.
dbt test
This will execute all defined tests in the DBT project.
Capturing Test Results
DBT does not have a built-in feature to capture test results in a database or output file. However, DBT does return JSON output of the test results which you can capture and store.
You can use the –output option with dbt test to output the test results to a JSON file:
dbt test --output results.json
Then, you can parse this JSON file and store it in your database or use it however you need.
Example
Let’s assume we have a DBT project with a model called orders in our models directory. We want to ensure that the order_id column is unique and not null.
In our tests directory, we create a schema.yml file with the following content:
version: 2
models:
- name: orders
columns:
- name: order_id
tests:
- unique
- not_null
Now, we can run our tests:
dbt test
DBT will provide output on the console indicating which tests passed and which failed. If we want to capture these results, we can output it to a JSON file:
dbt test --output test_results.json
The test_results.json file will contain detailed information about each test, such as the test name, model, status (pass/fail), failure count, total rows, and more.
Sample JSON result:
{
"metadata": {...},
"results": [
{
"status": "pass",
"message": null,
"execution_time": 0.06402707099914551,
"fail": null,
"warn": null,
"error": null,
"skip": false,
"test_metadata": {
"name": "unique_order_id",
"kwargs": {...},
"total_rows": 1000,
"passed_rows": 1000
}
},
{
"status": "pass",
"message": null,
"execution_time": 0.06440496444702148,
"fail": null,
"warn": null,
"error": null,
"skip": false,
"test_metadata": {
"name": "not_null_order_id",
"kwargs": {...},
"total_rows": 1000,
"passed_rows": 1000
}
},
]
}
By capturing test results this way, we have the flexibility to parse and store them in a database, send them to a monitoring system, or do anything else we need. This is a great way to maintain a record of data quality over time and enhance the reliability of your data pipeline.
Get more useful articles on dbt