Pipelines

This module provides pre-built templates that make it easy to build generic data science workflows. The templates are constructed from steps.

class stepfunctions.template.pipeline.train.TrainingPipeline(estimator, role, inputs, s3_bucket, client=None, **kwargs)

Bases: stepfunctions.template.pipeline.common.WorkflowTemplate

Creates a standard training pipeline with the following steps in order:

  1. Train estimator
  2. Create estimator model
  3. Endpoint configuration
  4. Deploy model
Parameters:
  • estimator (sagemaker.estimator.EstimatorBase) – The estimator to use for training. Can be a BYO estimator, Framework estimator or Amazon algorithm estimator.
  • role (str) – An AWS IAM role (either name or full Amazon Resource Name (ARN)). This role is used to create, manage, and execute the Step Functions workflows.
  • inputs

    Information about the training data. Please refer to the fit() method of the associated estimator, as this can take any of the following forms:

    • (str) - The S3 location where training data is saved.
    • (dict[str, str] or dict[str, sagemaker.session.s3_input]) - If using multiple channels for training data, you can specify a dict mapping channel names to strings or sagemaker.session.s3_input objects.
    • (sagemaker.session.s3_input) - Channel configuration for S3 data sources that can provide additional information about the training dataset. See sagemaker.session.s3_input for full details.
    • (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of Amazon Record objects serialized and stored in S3. For use with an estimator for an Amazon algorithm.
    • (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of sagemaker.amazon.amazon_estimator.RecordSet objects, where each instance is a different channel of training data.
  • s3_bucket (str) – S3 bucket under which the output artifacts from the training job will be stored. The parent path used is built using the format: s3://{s3_bucket}/{pipeline_name}/models/{job_name}/. In this format, pipeline_name refers to the keyword argument provided for TrainingPipeline. If a pipeline_name argument was not provided, one is auto-generated by the pipeline as training-pipeline-<timestamp>. Also, in the format, job_name refers to the job name provided when calling the TrainingPipeline.run() method.
  • client (SFN.Client, optional) – boto3 client to use for creating and interacting with the training pipeline in Step Functions. (default: None)
Keyword Arguments:
 

pipeline_name (str, optional) – Name of the pipeline. This name will be used to name jobs (if not provided when calling execute()), models, endpoints, and S3 objects created by the pipeline. If a pipeline_name argument was not provided, one is auto-generated by the pipeline as training-pipeline-<timestamp>. (default:None)

build_workflow_definition()

Build the workflow definition for the training pipeline with all the states involved.

Returns:Workflow definition as a chain of states involved in the the training pipeline.
Return type:Chain
execute(job_name=None, hyperparameters=None)

Run the training pipeline.

Parameters:
  • job_name (str, optional) – Name for the training job. If one is not provided, a job name will be auto-generated. (default: None)
  • hyperparameters (dict, optional) – Hyperparameters for the estimator training. (default: None)
Returns:

Running instance of the training pipeline.

Return type:

Execution

class stepfunctions.template.pipeline.inference.InferencePipeline(preprocessor, estimator, inputs, s3_bucket, role, client=None, **kwargs)

Bases: stepfunctions.template.pipeline.common.WorkflowTemplate

Creates a standard inference pipeline with the following steps in order:

  1. Train preprocessor
  2. Create preprocessor model
  3. Transform input data using preprocessor model
  4. Train estimator
  5. Create estimator model
  6. Endpoint configuration
  7. Deploy estimator model
Parameters:
  • preprocessor (sagemaker.estimator.EstimatorBase) – The estimator used to preprocess and transform the training data.
  • estimator (sagemaker.estimator.EstimatorBase) – The estimator to use for training. Can be a BYO estimator, Framework estimator or Amazon algorithm estimator.
  • role (str) – An AWS IAM role (either name or full Amazon Resource Name (ARN)). This role is used to create, manage, and execute the Step Functions workflows.
  • inputs

    Information about the training data. Please refer to the fit() method of the associated estimator, as this can take any of the following forms:

    • (str) - The S3 location where training data is saved.
    • (dict[str, str] or dict[str, sagemaker.session.s3_input]) - If using multiple channels for training data, you can specify a dict mapping channel names to strings or sagemaker.session.s3_input objects.
    • (sagemaker.session.s3_input) - Channel configuration for S3 data sources that can provide additional information about the training dataset. See sagemaker.session.s3_input for full details.
    • (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of Amazon Record objects serialized and stored in S3. For use with an estimator for an Amazon algorithm.
    • (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of sagemaker.amazon.amazon_estimator.RecordSet objects, where each instance is a different channel of training data.
  • s3_bucket (str) – S3 bucket under which the output artifacts from the training job will be stored. The parent path used is built using the format: s3://{s3_bucket}/{pipeline_name}/models/{job_name}/. In this format, pipeline_name refers to the keyword argument provided for TrainingPipeline. If a pipeline_name argument was not provided, one is auto-generated by the pipeline as training-pipeline-<timestamp>. Also, in the format, job_name refers to the job name provided when calling the TrainingPipeline.run() method.
  • client (SFN.Client, optional) – boto3 client to use for creating and interacting with the inference pipeline in Step Functions. (default: None)
Keyword Arguments:
 
  • compression_type (str, optional) – Compression type (Gzip/None) of the file for TransformJob. (default:None)
  • content_type (str, optional) – Content type (MIME) of the document to be used in preprocessing script. See SageMaker documentation for more details. (default:None)
  • pipeline_name (str, optional) – Name of the pipeline. This name will be used to name jobs (if not provided when calling execute()), models, endpoints, and S3 objects created by the pipeline. If a pipeline_name argument was not provided, one is auto-generated by the pipeline as training-pipeline-<timestamp>. (default:None)
build_workflow_definition()

Build the workflow definition for the inference pipeline with all the states involved.

Returns:Workflow definition as a chain of states involved in the the inference pipeline.
Return type:Chain
execute(job_name=None, hyperparameters=None)

Run the inference pipeline.

Parameters:
  • job_name (str, optional) – Name for the training job. This is also used as suffix for the preprocessing job as preprocess-<job_name>. If one is not provided, a job name will be auto-generated. (default: None)
  • hyperparameters (dict, optional) – Hyperparameters for the estimator training. (default: None)
Returns:

Running instance of the inference pipeline.

Return type:

Execution