In this article, we will create a Python script that automates the process of moving S3 data with a current date suffix to a backup path. Often, data files are stored in an S3 bucket with a suffix containing the current date to keep track of their creation or modification date. Our script will identify files with the current date suffix, move them to a backup path within the same bucket, and maintain an organized data repository.
Prerequisites
Before we proceed, make sure you have the following:
1. Python installed on your system.
2.Appropriate access credentials and permissions to access the S3 bucket and perform read and write operations.
Required Access Permissions
The AWS IAM user or role associated with your Python script will need the following access permissions:
1. Read access to the original S3 bucket to retrieve data files.
2. Write access to the backup path in the same S3 bucket to store the moved data files.
Ensure that you provide the necessary permissions to the IAM entity for a seamless data transfer process.
import boto3
import os
from datetime import datetime
def move_data_with_current_date_suffix(bucket_name, raw_path, backup_path):
# Initialize S3 client
s3_client = boto3.client('s3')
# Get the current date in the format 'yyyymmdd'
current_date = datetime.now().strftime('%Y%m%d')
# List all objects in the raw path of the S3 bucket
response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=raw_path)
if 'Contents' in response:
for obj in response['Contents']:
# Check if the object has the current date suffix
if obj['Key'].endswith(f"{current_date}.csv"):
# Build the source and destination paths
source_path = os.path.join('s3://', bucket_name, obj['Key'])
destination_path = os.path.join('s3://', bucket_name, backup_path, obj['Key'])
# Copy the object to the backup path
s3_client.copy_object(Bucket=bucket_name, CopySource=source_path, Key=destination_path)
# Delete the original object from the raw path
s3_client.delete_object(Bucket=bucket_name, Key=obj['Key'])
print(f"Moved {obj['Key']} to {backup_path}")
else:
print(f"No objects found in the raw path: {raw_path}")
if __name__ == "__main__":
# Replace with your S3 bucket name
bucket_name = "freshers-in-data"
# Replace with your raw and backup paths
raw_path = "raw/"
backup_path = "backup/"
move_data_with_current_date_suffix(bucket_name, raw_path, backup_path)
Explanation
3. We initiate the S3 client using boto3.client.
4. The current date is retrieved and formatted as yyyymmdd.
5. The script lists all objects in the raw path of the S3 bucket using list_objects_v2.
6. For each object in the raw path, we check if it has a filename ending with the current date suffix (e.g., 20230728.csv).
7. If the current date suffix is found, we construct the source and destination paths for the S3 object within the same bucket.
8. use copy_object to move the object from the raw path to the backup path and then delete_object to remove the original object from the raw path.
9. The script provides feedback on the moved files, and if no objects are found in the raw path, it notifies the user.
Refer more on python here : Python