Pipelines¶

This module provides pre-built templates that make it easy to build generic data science workflows. The templates are constructed from steps.

class stepfunctions.template.pipeline.train.TrainingPipeline(estimator, role, inputs, s3_bucket, client=None, **kwargs)¶

Bases: stepfunctions.template.pipeline.common.WorkflowTemplate

Creates a standard training pipeline with the following steps in order:

Train estimator

Create estimator model

Endpoint configuration

Deploy model

Parameters:

estimator (sagemaker.estimator.EstimatorBase) – The estimator to use for training. Can be a BYO estimator, Framework estimator or Amazon algorithm estimator.
role (str) – An AWS IAM role (either name or full Amazon Resource Name (ARN)). This role is used to create, manage, and execute the Step Functions workflows.
inputs –
Information about the training data. Please refer to the fit() method of the associated estimator, as this can take any of the following forms:
- (str) - The S3 location where training data is saved.
- (dict[str, str] or dict[str, sagemaker.inputs.TrainingInput]) - If using multiple channels for training data, you can specify a dict mapping channel names to strings or sagemaker.inputs.TrainingInput objects.
- (sagemaker.inputs.TrainingInput) - Channel configuration for S3 data sources that can provide additional information about the training dataset. See sagemaker.inputs.TrainingInput for full details.
- (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of Amazon Record objects serialized and stored in S3. For use with an estimator for an Amazon algorithm.
- (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of sagemaker.amazon.amazon_estimator.RecordSet objects, where each instance is a different channel of training data.
s3_bucket (str) – S3 bucket under which the output artifacts from the training job will be stored. The parent path used is built using the format: s3://{s3_bucket}/{pipeline_name}/models/{job_name}/. In this format, pipeline_name refers to the keyword argument provided for TrainingPipeline. If a pipeline_name argument was not provided, one is auto-generated by the pipeline as training-pipeline-<timestamp>. Also, in the format, job_name refers to the job name provided when calling the TrainingPipeline.run() method.
client (SFN.Client, optional) – boto3 client to use for creating and interacting with the training pipeline in Step Functions. (default: None)

Keyword Arguments:

pipeline_name (str, optional) – Name of the pipeline. This name will be used to name jobs (if not provided when calling execute()), models, endpoints, and S3 objects created by the pipeline. If a pipeline_name argument was not provided, one is auto-generated by the pipeline as training-pipeline-<timestamp>. (default:None)

build_workflow_definition()¶

Build the workflow definition for the training pipeline with all the states involved.

Returns:	Workflow definition as a chain of states involved in the the training pipeline.
Return type:	`Chain`

execute(job_name=None, hyperparameters=None)¶

Run the training pipeline.

Parameters:	job_name (str, optional) – Name for the training job. If one is not provided, a job name will be auto-generated. (default: None) hyperparameters (dict, optional) – Hyperparameters for the estimator training. (default: None)
Returns:	Running instance of the training pipeline.
Return type:	`Execution`

class stepfunctions.template.pipeline.inference.InferencePipeline(preprocessor, estimator, inputs, s3_bucket, role, client=None, **kwargs)¶

Bases: stepfunctions.template.pipeline.common.WorkflowTemplate

Creates a standard inference pipeline with the following steps in order:

Train preprocessor

Create preprocessor model

Transform input data using preprocessor model

Train estimator

Create estimator model

Endpoint configuration

Deploy estimator model

Parameters:

preprocessor (sagemaker.estimator.EstimatorBase) – The estimator used to preprocess and transform the training data.
estimator (sagemaker.estimator.EstimatorBase) – The estimator to use for training. Can be a BYO estimator, Framework estimator or Amazon algorithm estimator.
role (str) – An AWS IAM role (either name or full Amazon Resource Name (ARN)). This role is used to create, manage, and execute the Step Functions workflows.
inputs –
Information about the training data. Please refer to the fit() method of the associated estimator, as this can take any of the following forms:
- (str) - The S3 location where training data is saved.
- (dict[str, str] or dict[str, sagemaker.inputs.TrainingInput]) - If using multiple channels for training data, you can specify a dict mapping channel names to strings or sagemaker.inputs.TrainingInput objects.
- (sagemaker.inputs.TrainingInput) - Channel configuration for S3 data sources that can provide additional information about the training dataset. See sagemaker.inputs.TrainingInput for full details.
- (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of Amazon Record objects serialized and stored in S3. For use with an estimator for an Amazon algorithm.
- (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of sagemaker.amazon.amazon_estimator.RecordSet objects, where each instance is a different channel of training data.
s3_bucket (str) – S3 bucket under which the output artifacts from the training job will be stored. The parent path used is built using the format: s3://{s3_bucket}/{pipeline_name}/models/{job_name}/. In this format, pipeline_name refers to the keyword argument provided for TrainingPipeline. If a pipeline_name argument was not provided, one is auto-generated by the pipeline as training-pipeline-<timestamp>. Also, in the format, job_name refers to the job name provided when calling the TrainingPipeline.run() method.
client (SFN.Client, optional) – boto3 client to use for creating and interacting with the inference pipeline in Step Functions. (default: None)

Keyword Arguments:

compression_type (str, optional) – Compression type (Gzip/None) of the file for TransformJob. (default:None)
content_type (str, optional) – Content type (MIME) of the document to be used in preprocessing script. See SageMaker documentation for more details. (default:None)
pipeline_name (str, optional) – Name of the pipeline. This name will be used to name jobs (if not provided when calling execute()), models, endpoints, and S3 objects created by the pipeline. If a pipeline_name argument was not provided, one is auto-generated by the pipeline as training-pipeline-<timestamp>. (default:None)

build_workflow_definition()¶

Build the workflow definition for the inference pipeline with all the states involved.

Returns:	Workflow definition as a chain of states involved in the the inference pipeline.
Return type:	`Chain`

execute(job_name=None, hyperparameters=None)¶

Run the inference pipeline.

Parameters:	job_name (str, optional) – Name for the training job. This is also used as suffix for the preprocessing job as preprocess-<job_name>. If one is not provided, a job name will be auto-generated. (default: None) hyperparameters (dict, optional) – Hyperparameters for the estimator training. (default: None)
Returns:	Running instance of the inference pipeline.
Return type:	`Execution`