EMR’s Add Steps API – An Overview

user June 15, 2023 Leave a Comment

Amazon EMR’s Add Steps API is a part of Amazon EMR that allows you to add steps to a running cluster programmatically. A step in EMR is a unit of work that contains one or more Hadoop jobs.

The API operation is AddJobFlowSteps, and it inserts steps (which are defined by StepConfig objects) into the specified cluster. Each step is composed of a JAR file (which contains the main function of the step) and arguments for the JAR file. The JAR file can be a custom JAR or a predefined command-runner JAR provided by Amazon EMR.

Steps can be added to the cluster at any time. When you add a step, you also specify an action to take if a step fails: continue to the next step, stop processing further steps, or terminate the Amazon EMR cluster.

The steps are processed in the order they are added. For each step, the cluster executes the specified JAR file with the specified arguments. The step is considered complete if it finishes without throwing an error. If a step fails, the cluster takes the action that you specified when you added the step.

Here is an example of how to add a step to an existing cluster using the AWS CLI:

aws emr add-steps --cluster-id j-2AXXXXXXGAPLF --steps Type=CUSTOM_JAR,Name="My step",ActionOnFailure=CONTINUE,Jar=s3://freshers-in-bkt/mytest.jar,Args=["arg1","arg2","arg3"]

In this command, Type=CUSTOM_JAR specifies the type of the step. Name=”My step” is a label for the step. ActionOnFailure=CONTINUE tells the cluster to continue to the next step if this step fails. Jar=s3://freshers-in-bkt/mytest.jar specifies the location of the JAR file to run for the step, and Args=[“arg1″,”arg2″,”arg3”] are the arguments to pass to the JAR.

The EMR Add Steps API is a very flexible way of controlling your processing workflow in EMR. It allows you to adjust the processing logic of your cluster on the fly, without having to stop and start new clusters.

Post Views: 85