How to add additional Python Libraries in a AWS Glue Development Endpoint

AWS Glue @ Freshers.in

There are multiple scenario that you may need to use different set of python libraries in your python code or ETL scripts.

a. You can either set up a separate development endpoint for each set.

b. You can overwrite the library .zip file(s) that your development endpoint loads every time you switch scripts.

In the console you canĀ  specify one or more library .zip files for a development endpoint when you create it. You can zip all the libraries and keep in a S3 path ( s3://freshers-in-bucket/prefix/site-packages.zip) . If you may need to point to multiple zip files then you can mention all separated by comma (s3://freshers-in-bucket-A/prefix/libA.zip,s3://freshers-in-bucket-B/prefix/libB.zip) . If you are going to update the library then you can use the console to re-import them into your development endpoint.
You can specify library files using the AWS Glue APIs as well as bellow.

dep = glue.create_dev_endpoint(
EndpointName="freshers_in_DevEndpoint",
RoleArn="arn:aws:iam::42398602034423",
SecurityGroupIds="in-dfr5gdddreww",
SubnetId="subnet-f4234ddgd",
PublicKey="ssh-rsa ASSDFEeerwTFJKTDSQWEQWFDGHGHGy...",
NumberOfNodes=2,
ExtraPythonLibsS3Path="s3://freshers-in-bucket-A/prefix/libA.zip,s3://freshers-in-bucket-B/prefix/libB.zip")

For Zeppelin Notebook

Call the following PySpark function before importing a package or packages from your .zip file

sc.addPyFile("/home/glue/downloads/python/freshers-in-packages.zip")

CreateJob : If you are doing a create job then you need to use –extra-py-files default parameter

job = glue.create_job(Name='freshersSampleJob',
Role='Glue_Freshers_Role',
Command={'Name': 'freshers_in',
'ScriptLocation': 's3://freshers_bucket/scripts/freshers_sample_script.py'},
DefaultArguments={'--extra-py-files': 's3://freshers-in-bucket-A/prefix/libA.zip,s3://freshers-in-bucket-B/prefix/libB.zip'})

For Jobrun , you can always override the default library setting with a different one

runId = glue.start_job_run(JobName='freshers_in_sampleJob',
Arguments={'--extra-py-files': 's3://freshers-in-bucket-A/prefix/libA.zip'})
AWS Glue
Author: user

Leave a Reply