Copying data from one database management system to another, specifically from Amazon Redshift to Snowflake, can be done by several means. Here, we will focus on a specific method that utilizes Amazon Simple Storage Service (S3) as a staging area.
Prerequisites:
1. Access to AWS Account: You will need access to an Amazon Web Services account where Redshift and S3 services are running.
2. Access to Snowflake Account: You should have the required permissions on the Snowflake side as well to fetch the data from the S3 bucket and load it into Snowflake tables.
3. S3 Bucket: An S3 bucket (in this case, s3://freshers-in/data/dmp/temp/files/) where you can temporarily store your data while transferring it from Redshift to Snowflake.
4. Redshift Cluster: You should have access to the Redshift cluster where the source table resides, in this case, fr_view.freshers_rs_views.
5. Snowflake Data Warehouse: You should have access to a Snowflake database and warehouse that will hold the new table fr_view.freshers_rs_views.
6. Access to necessary tools: SQL Workbench or any other SQL client for Redshift, and Snowflake Web UI or SnowSQL CLI for Snowflake.
Access Requirements:
1. On Redshift: You need to have sufficient privileges to perform SELECT operations on the source table.
2. On S3:You need to have s3:PutObject, s3:GetObject, and s3:ListBucket permissions for the S3 bucket .
3. On Snowflake: You need to have the USAGE privilege on the database and the schema where the table resides. Also, you need to have CREATE TABLE, CREATE STAGE, and CREATE FILE FORMAT privileges.
Step by Step Procedure:
Exporting data from Redshift to S3
1. Login to your SQL Workbench or your preferred SQL client and connect to your Amazon Redshift database.
2. Run the following command to export the data from the freshers_rs_views table to S3:
UNLOAD ('SELECT * FROM fr_view.freshers_rs_views')
TO 's3://freshers-in/data/dmp/temp/files/freshers_rs_views_'
IAM_ROLE 'arn:aws:iam::1800989766656:role/freshers_jobs_role'
FORMAT AS PARQUET
Configuring Snowflake for S3 access
1. Log in to your Snowflake account.
2. Create a storage integration object that allows Snowflake to access the S3 bucket. Run the following command:
CREATE STORAGE INTEGRATION s3_int
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = S3
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::1800989766656:role/freshers_jobs_role'
STORAGE_ALLOWED_LOCATIONS = ('s3://freshers-in/')
After creating the integration, you can create a file format object (if not already present). If your files are in CSV format, you can create the file format as follows:
CREATE FILE FORMAT my_file_format
TYPE = 'CSV'
FIELD_DELIMITER = ','
SKIP_HEADER = 1;
If your files are in Parquet format, the file format should be:
CREATE FILE FORMAT my_file_format
TYPE = 'PARQUET';
Create a stage that points to the S3 location. This will be used to reference the files in the S3 bucket.
CREATE STAGE my_stage
URL = 's3://freshers-in/data/dmp/temp/files/'
STORAGE_INTEGRATION = s3_int
FILE_FORMAT = my_file_format;
Loading Data from S3 to Snowflake
Create an empty table in Snowflake that matches the structure of the Redshift table.
CREATE TABLE fr_view.freshers_rs_views AS SELECT * FROM fr_view.freshers_rs_views LIMIT 0;
Make sure the data types in the Snowflake table align with the data you’re importing.
Now, you can copy the data from the S3 bucket to the Snowflake table using the following command:
COPY INTO fr_view.freshers_rs_views
FROM @my_stage/
You can also include a file pattern to select specific files. For example:
COPY INTO fr_view.freshers_rs_views
FROM @my_stage/
FILES = ('freshers_rs_views_*.parquet');
Snowflake important urls to refer