deploy¶

MultiModalCloudPredictor.deploy(predictor_path: str | None = None, endpoint_name: str | None = None, framework_version: str = 'latest', instance_type: str | None = None, initial_instance_count: int = 1, custom_image_uri: str | None = None, volume_size: int | None = None, wait: bool = True, inference_mode: Literal['realtime', 'serverless'] = 'realtime', inference_config: Dict[str, Any] | None = None, backend_kwargs: Dict | None = None) → None¶

Deploy a predictor to an inference endpoint.

Parameters:

predictor_path (str) – Path to the predictor tarball you want to deploy. Path can be both a local path or a S3 location. If None, will deploy the most recent trained predictor trained with fit().
endpoint_name (str) – The endpoint name to use for the deployment. If None, CloudPredictor will create one with prefix ag-cloudpredictor
framework_version (str, default = latest) – Inference container version of autogluon. If latest, will use the latest available container version. If provided a specific version, will use this version. If custom_image_uri is set, this argument will be ignored.
instance_type (Optional[str], default = None) – Instance to be deployed for the endpoint. Defaults to ml.m5.2xlarge. Must be None when inference_mode="serverless".
initial_instance_count (int, default = 1,) – Initial number of instances to be deployed for the endpoint. Ignored when inference_mode="serverless".
custom_image_uri (Optional[str], default = None,) – Custom image to use to deploy endpoint with. If not specified, with use official DLC image: https://github.com/aws/deep-learning-containers/blob/master/available_images.md#autogluon-inference-containers
volumes_size (int, default = None) – Size in GB of the EBS volume to use for the endpoint (default: None). SageMaker GPU instance endpoint currently doesn’t support specifying volumes_size. Will ignore in such cases.
wait (Bool, default = True,) – Whether to wait for the endpoint to be deployed. To be noticed, the function won’t return immediately because there are some preparations needed prior deployment.
inference_mode ({"realtime", "serverless"}, default = "realtime") – Endpoint type. "serverless" provisions a SageMaker Serverless Inference endpoint (no instance management, scales to zero).
inference_config (Optional[Dict[str, Any]], default = None) – Mode-specific overrides forwarded to sagemaker.serverless.ServerlessInferenceConfig (e.g. memory_size_in_mb, max_concurrency).
backend_kwargs (dict, default = None) –
Any extra arguments needed to pass to the underneath backend. For SageMaker backend, valid keys are:
1. model_kwargs: dict, default = dict()
  Any extra arguments needed to initialize Sagemaker Model Please refer to https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#model for all options
2. deploy_kwargs
  Any extra arguments needed to pass to deploy. Please refer to https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.Model.deploy for all options