DBT (Data Build Tool) allows you to extract artifacts from your database, including the manifest, catalog, run-results, and sources files. These artifacts are important for version control, auditing, and debugging, and can be useful for tracking changes in your data model and pipeline over time. In this article, we will cover the steps for extracting these artifacts from your database using DBT.
DBT (Data Build Tool) enables you to retrieve key components, referred to as artifacts, from your database. These artifacts include the manifest, catalog, run-results, and sources files. These artifacts serve essential purposes such as version control, auditing, and debugging, and they help you monitor and understand changes in your data model and pipeline as they evolve. In this article, we will explain the process of extracting these artifacts from your database using DBT.
Step 1: Export the manifest
The manifest file contains information about the models and resources in your DBT project, including the version of DBT used, the database connection information, and the dependencies required to run your project. You can export the manifest file by running the following command in your terminal:
dbt seed --output-path <path/to/output/folder>
This will generate a manifest.json file in the specified output folder with the required information.
Step 2: Export the catalog
The catalog file contains information about the tables and views in your database, including the columns, data types, and other metadata. You can export the catalog by running the following command in your terminal:
dbt metadata export --output-path <path/to/output/folder>
This will generate a catalog.json file in the specified output folder with the required information.
Step 3: Export the run-results
The run-results file contains information about the execution of your DBT project, including the start and end times, the models and resources that were run, and the status of each run. You can export the run-results by running the following command in your terminal:
dbt results export --output-path <path/to/output/folder>
This will generate a run-results.json file in the specified output folder with the required information.
Step 4: Export the sources
The sources file contains the source code for your DBT models and resources, including SQL scripts and YAML configuration files. You can export the sources by copying the relevant files from your DBT project directory to the specified output folder.
In conclusion, extracting artifacts from your database using DBT is a crucial step in version control, auditing, and debugging. By following these steps, you can ensure that you have a complete record of your data model and pipeline, which can be useful for tracking changes over time and for troubleshooting any issues that may arise.
Most dbt commands (and corresponding RPC methods) produce artifacts:
manifest: produced by commands that read and understand your project
run results: produced by commands that run, compile, or catalog nodes in your DAG
catalog: produced by docs generate
sources: produced by source freshness
All artifacts produced by dbt include a metadata dictionary with these properties:
dbt_version: Version of dbt that produced this artifact.
dbt_schema_version: URL of this artifact’s schema. See notes below.
generated_at: Timestamp in UTC when this artifact was produced.
adapter_type: The adapter (database), e.g. postgres, spark, etc.
env: Any environment variables prefixed with DBT_ENV_CUSTOM_ENV_ will be included in a dictionary, with the prefix-stripped variable name as its key.
invocation_id: Unique identifier for this dbt invocation
In the manifest, the metadata may also include:
send_anonymous_usage_stats: Whether this invocation sent anonymous usage statistics while executing.
project_id: Project identifier, hashed from project_name, sent with anonymous usage stats if enabled.
user_id: User identifier, stored by default in ~/dbt/.user.yml, sent with anonymous usage stats if enabled.
Get more useful articles on dbt