What is Seeds in dbt? How to load CSV files into your data warehouse using the dbt ?

Seeds are the CSV files in your dbt project. There will be a seed director , that dbt can load into your data warehouse using the dbt seed command. Seeds can be referenced in the downstream models the same way as referencing models, using the ref function. Seeds are best suited to static data which changes infrequently as they are version controlled and code reviewable


Add the file to your seeds directory, with a .csv file extension, e.g. seeds/city_zip_code.csv
Ellicott City,21043
Upper Marlboro,20774

Run the dbt seed command command — a new table will be created in your warehouse in your target schema, named city_zip_codes

$ dbt seed
Running with dbt=1.00.1

Refer to seeds in downstream models using the ref function.

— This refers to the table created from seeds/city_zip_codes.csv

select * from {{ ref('city_zip_codes') }}

Seeds are configured in your dbt_project.yml

Points to note

Seeds should not be used to load raw data. As seeds are version controlled, they are best suited to files that contain business-specific logic, for example a list of country codes or user IDs of employees.

By default, dbt expects your seed files to be located in the seeds subdirectory of your project. To change this, update the seed-paths configuration in your dbt_project.yml

seed-paths: ["custom_seeds"]

If you want to rerun the command with a –full-refresh . The below command dbt truncates the existing table and reinserts the data

dbt seed --full-refresh

dbt will infer the datatype for each column based on the data in your CSV. If you need to explicitly set a datatype, then that can be achieved by using the column_types configuration in dbt_project.yml

  freshers_in: # you must include the project name
        article_title: varchar(500)

To build one seed at a time you can use 

$ dbt seed --select country_codes

If you want to exclude , you can use –exclude function as well 

Author: user

Leave a Reply