DBT : How to clean up removed models from your production schema ?

user February 6, 2023 Leave a Comment

DBT (Data Build Tool) is a powerful open-source tool used for data transformation and analysis. As your data model evolves, you may need to remove some models that are no longer in use. Removing these unused models is important to keep your production schema organized, maintainable, and efficient. In this article, we will cover the steps for cleaning up removed models from your production schema using DBT.

Step 1: Remove the model from your DBT project

The first step in cleaning up a removed model is to remove it from your DBT project. This can be done by deleting the model file and any related macro files from your project directory. You should also remove any references to the model in your DBT project files, such as the models and snapshots block in your dbt_project.yml file.

Step 2: Drop the model from your database

Once you have removed the model from your DBT project, you need to drop it from your database. You can do this by running the following command in your terminal:

dbt drop-model <model_name>

Step 3: Run a full compile and run

After you have dropped the model from your database, you should run a full compile and run to ensure that all related tables and views are dropped as well. To do this, run the following command in your terminal:

dbt run --full-refresh

Step 4: Validate the changes

Finally, validate the changes by checking your database to ensure that the model and all related tables and views have been dropped. You can do this by querying the database or using a database management tool to view the schema.

In conclusion, cleaning up removed models from your production schema is an important part of maintaining a well-organized and efficient data model. By following these steps, you can ensure that your database stays organized and that any unused models are removed to free up resources and improve performance.

Other options

Periodically Drop and Rebuild the Entire Schema

This option refers to a strategy where you periodically delete and recreate the entire database schema (the structure that defines how data is organized) as a method to eliminate any database objects (such as tables, views, or stored procedures) that are no longer being used. DBT, a data transformation tool, is built with the assumption that you can recreate all the necessary database objects at any point in time, which makes this approach viable.

However, there are some important considerations:

Simplicity: This method is straightforward because it ensures that only necessary objects are present in the schema. It eliminates any unused or obsolete database components.
Rebuild Capability: DBT’s design philosophy allows for easy reconstruction of objects, which aligns with this approach.

On the flip side:

Downtime: Dropping and rebuilding the schema can lead to downtime, during which the database may be inaccessible to users. This downtime can disrupt ongoing operations and impact your organization’s ability to access and use the data.
Risks: This approach carries risks, particularly if you’re not completely confident in your ability to recreate all necessary objects correctly. If something goes wrong during the rebuild process, it can result in data loss or other issues.

Query the Information Schema to Find Extra Objects in Prod

This option involves using a query to find extra objects in the prod schema. The query can be run in the analysis directory and when executed against the database, it will identify objects such as tables, views, and functions that exist in the prod schema but do not exist in the related dev schema. It is important to note that this approach assumes that the dev database has been routinely dropped, so it does not contain any extra objects. This query has been tested on both Redshift and Postgres databases.

Get more useful articles on dbt

Post Views: 1,494

Author: user

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget

Electronics and Instrumentation

Chemical Engineering

Civil Engineering

Backpressure in AWS Kinesis Streams: Optimizing Data Processing

Troubleshooting Data Ingestion and Processing Issues with AWS Kinesis Streams

Impact of Shard Count Modification on AWS Kinesis Streams

How to map values of a Series according to an input correspondence:SSeries.map()

Understanding Series.transform(func[, axis])

Series.aggregate(func) : Pandas API on Spark

Series.agg(func) : Pandas API on Spark

Most Viewed Posts

Related Posts

Related Articles

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget