Pipelines¶
This module provides pre-built templates that make it easy to build generic data science workflows. The templates are constructed from steps.
-
class
stepfunctions.template.pipeline.train.
TrainingPipeline
(estimator, role, inputs, s3_bucket, client=None, **kwargs)¶ Bases:
stepfunctions.template.pipeline.common.WorkflowTemplate
Creates a standard training pipeline with the following steps in order:
- Train estimator
- Create estimator model
- Endpoint configuration
- Deploy model
Parameters: - estimator (sagemaker.estimator.EstimatorBase) – The estimator to use for training. Can be a BYO estimator, Framework estimator or Amazon algorithm estimator.
- role (str) – An AWS IAM role (either name or full Amazon Resource Name (ARN)). This role is used to create, manage, and execute the Step Functions workflows.
- inputs –
Information about the training data. Please refer to the fit() method of the associated estimator, as this can take any of the following forms:
- (str) - The S3 location where training data is saved.
- (dict[str, str] or dict[str, sagemaker.inputs.TrainingInput]) - If using multiple channels for training data, you can specify a dict mapping channel names to strings or sagemaker.inputs.TrainingInput objects.
- (sagemaker.inputs.TrainingInput) - Channel configuration for S3 data sources that can provide additional information about the training dataset. See sagemaker.inputs.TrainingInput for full details.
- (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of Amazon Record objects serialized and stored in S3. For use with an estimator for an Amazon algorithm.
- (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of sagemaker.amazon.amazon_estimator.RecordSet objects, where each instance is a different channel of training data.
- s3_bucket (str) – S3 bucket under which the output artifacts from the training job will be stored. The parent path used is built using the format:
s3://{s3_bucket}/{pipeline_name}/models/{job_name}/
. In this format, pipeline_name refers to the keyword argument provided for TrainingPipeline. If a pipeline_name argument was not provided, one is auto-generated by the pipeline as training-pipeline-<timestamp>. Also, in the format, job_name refers to the job name provided when calling theTrainingPipeline.run()
method. - client (SFN.Client, optional) – boto3 client to use for creating and interacting with the training pipeline in Step Functions. (default: None)
Keyword Arguments: pipeline_name (str, optional) – Name of the pipeline. This name will be used to name jobs (if not provided when calling execute()), models, endpoints, and S3 objects created by the pipeline. If a pipeline_name argument was not provided, one is auto-generated by the pipeline as training-pipeline-<timestamp>. (default:None)
-
build_workflow_definition
()¶ Build the workflow definition for the training pipeline with all the states involved.
Returns: Workflow definition as a chain of states involved in the the training pipeline. Return type: Chain
-
execute
(job_name=None, hyperparameters=None)¶ Run the training pipeline.
Parameters: Returns: Running instance of the training pipeline.
Return type:
-
class
stepfunctions.template.pipeline.inference.
InferencePipeline
(preprocessor, estimator, inputs, s3_bucket, role, client=None, **kwargs)¶ Bases:
stepfunctions.template.pipeline.common.WorkflowTemplate
Creates a standard inference pipeline with the following steps in order:
- Train preprocessor
- Create preprocessor model
- Transform input data using preprocessor model
- Train estimator
- Create estimator model
- Endpoint configuration
- Deploy estimator model
Parameters: - preprocessor (sagemaker.estimator.EstimatorBase) – The estimator used to preprocess and transform the training data.
- estimator (sagemaker.estimator.EstimatorBase) – The estimator to use for training. Can be a BYO estimator, Framework estimator or Amazon algorithm estimator.
- role (str) – An AWS IAM role (either name or full Amazon Resource Name (ARN)). This role is used to create, manage, and execute the Step Functions workflows.
- inputs –
Information about the training data. Please refer to the fit() method of the associated estimator, as this can take any of the following forms:
- (str) - The S3 location where training data is saved.
- (dict[str, str] or dict[str, sagemaker.inputs.TrainingInput]) - If using multiple channels for training data, you can specify a dict mapping channel names to strings or sagemaker.inputs.TrainingInput objects.
- (sagemaker.inputs.TrainingInput) - Channel configuration for S3 data sources that can provide additional information about the training dataset. See sagemaker.inputs.TrainingInput for full details.
- (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of Amazon Record objects serialized and stored in S3. For use with an estimator for an Amazon algorithm.
- (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of sagemaker.amazon.amazon_estimator.RecordSet objects, where each instance is a different channel of training data.
- s3_bucket (str) – S3 bucket under which the output artifacts from the training job will be stored. The parent path used is built using the format:
s3://{s3_bucket}/{pipeline_name}/models/{job_name}/
. In this format, pipeline_name refers to the keyword argument provided for TrainingPipeline. If a pipeline_name argument was not provided, one is auto-generated by the pipeline as training-pipeline-<timestamp>. Also, in the format, job_name refers to the job name provided when calling theTrainingPipeline.run()
method. - client (SFN.Client, optional) – boto3 client to use for creating and interacting with the inference pipeline in Step Functions. (default: None)
Keyword Arguments: - compression_type (str, optional) – Compression type (Gzip/None) of the file for TransformJob. (default:None)
- content_type (str, optional) – Content type (MIME) of the document to be used in preprocessing script. See SageMaker documentation for more details. (default:None)
- pipeline_name (str, optional) – Name of the pipeline. This name will be used to name jobs (if not provided when calling execute()), models, endpoints, and S3 objects created by the pipeline. If a pipeline_name argument was not provided, one is auto-generated by the pipeline as training-pipeline-<timestamp>. (default:None)
-
build_workflow_definition
()¶ Build the workflow definition for the inference pipeline with all the states involved.
Returns: Workflow definition as a chain of states involved in the the inference pipeline. Return type: Chain
-
execute
(job_name=None, hyperparameters=None)¶ Run the inference pipeline.
Parameters: - job_name (str, optional) – Name for the training job. This is also used as suffix for the preprocessing job as preprocess-<job_name>. If one is not provided, a job name will be auto-generated. (default: None)
- hyperparameters (dict, optional) – Hyperparameters for the estimator training. (default: None)
Returns: Running instance of the inference pipeline.
Return type: