-
Notifications
You must be signed in to change notification settings - Fork 4
DB deployment support for cluster IAM role #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
2801d26
435adc0
463e251
8da00a6
9af3669
888c066
c2343b8
7ead7a4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,6 +15,11 @@ | |
|
|
||
| DEFAULT_IMAGE_NAME = "mlflow_sage" | ||
|
|
||
| PYFUNC_IMAGE_URL = "707343435239.dkr.ecr.us-west-2.amazonaws.com/mlflow-pyfunc-test:latest" | ||
| DEFAULT_IMAGE_URL = PYFUNC_IMAGE_URL | ||
|
|
||
| DEFAULT_BUCKET_NAME_PREFIX = "mlflow-sagemaker" | ||
|
|
||
| _DOCKERFILE_TEMPLATE = """ | ||
| # Build an image that can serve pyfunc model in SageMaker | ||
| FROM ubuntu:16.04 | ||
|
|
@@ -134,8 +139,8 @@ def push_image_to_ecr(image=DEFAULT_IMAGE_NAME): | |
| os.system(cmd) | ||
|
|
||
|
|
||
| def deploy(app_name, model_path, execution_role_arn, bucket, run_id=None, | ||
| image="mlflow_sage", region_name="us-west-2"): | ||
| def deploy(app_name, model_path, bucket=None, image_url=DEFAULT_IMAGE_URL, run_id=None, | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not sure how we are about breaking API changes for MLflow currently - would leave it as close to original as possible but can ask in the PR against master There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also this breaks https://github.com/databricks/mlflow/blob/ed52e68be2a8dd9439eaaf4637e86ad3dc9e6fb4/mlflow/sagemaker/cli.py#L27 ? we should test the cli commands with these changes.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. @tomasatdatabricks - what are your thoughts? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm I think it's early enough to get away with breaking API changes although I would double check with Matei on that. I agree with @sueann that we should also update the cli. Not sure whether we want to pass the container url or stick to using the name in cli. We should also think through what happens if someone wants to built their custom image and use it instead of the default one. E.g. Should build container return url now? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| region_name="us-west-2", execution_role_arn=None): | ||
| """ | ||
| Deploy model on Sagemaker. | ||
| Current active AWS account needs to have correct permissions setup. | ||
|
|
@@ -146,23 +151,31 @@ def deploy(app_name, model_path, execution_role_arn, bucket, run_id=None, | |
| :param execution_role_arn: Amazon execution role with sagemaker rights | ||
| :param bucket: S3 bucket where model artifacts are gonna be stored | ||
| :param run_id: MLflow run id. | ||
| :param image: name of the Docker image to be used. | ||
| :param image_url: URL of the ECR-hosted docker image to be used | ||
| :param region_name: Name of the AWS region to deploy to. | ||
| """ | ||
| prefix = model_path | ||
| if run_id: | ||
| model_path = _get_model_log_dir(model_path, run_id) | ||
| prefix = run_id + "/" + prefix | ||
| prefix = os.path.join(run_id, prefix) | ||
| run_id = _check_compatible(model_path) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this changes the behavior ? was it incorrect before? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah sorry, was viewing a subset of changes - this was changing it back to the original logic |
||
|
|
||
| if bucket is None: | ||
| # Attempt to create a default bucket | ||
| eprint("No model data bucket specified, using the default bucket") | ||
| bucket = _get_default_s3_bucket(region_name) | ||
|
|
||
| if execution_role_arn is None: | ||
| execution_role_arn = _get_assumed_role_arn() | ||
|
|
||
| model_s3_path = _upload_s3(local_model_path=model_path, bucket=bucket, prefix=prefix) | ||
| _deploy(role=execution_role_arn, | ||
| image=image, | ||
| image_url=image_url, | ||
| app_name=app_name, | ||
| model_s3_path=model_s3_path, | ||
| run_id=run_id, | ||
| region_name=region_name) | ||
|
|
||
|
|
||
| def run_local(model_path, run_id=None, port=5000, image=DEFAULT_IMAGE_NAME): | ||
| """ | ||
| Serve model locally in a SageMaker compatible Docker container. | ||
|
|
@@ -212,6 +225,48 @@ def _make_tarfile(output_filename, source_dir): | |
| for f in os.listdir(source_dir): | ||
| tar.add(os.path.join(source_dir, f), arcname=f) | ||
|
|
||
| def _get_account_id(): | ||
| sess = boto3.Session() | ||
| sts_client = sess.client("sts") | ||
| identity_info = sts_client.get_caller_identity() | ||
| account_id = identity_info["Account"] | ||
| return account_id | ||
|
|
||
| def _get_assumed_role_arn(): | ||
| """ | ||
| :return: ARN of the user's current IAM role | ||
| """ | ||
| sess = boto3.Session() | ||
| sts_client = sess.client("sts") | ||
| identity_info = sts_client.get_caller_identity() | ||
| sts_arn = identity_info["Arn"] | ||
| role_name = sts_arn.split("/")[1] | ||
| iam_client = sess.client("iam") | ||
| role_response = iam_client.get_role(RoleName=role_name) | ||
| return role_response["Role"]["Arn"] | ||
|
|
||
| def _get_default_s3_bucket(region_name): | ||
| # create bucket if it does not exist | ||
| sess = boto3.Session() | ||
| account_id = _get_account_id() | ||
| region_name = sess.region_name or "us-west-2" | ||
| bucket_name = "{pfx}-{rn}-{aid}".format(pfx=DEFAULT_BUCKET_NAME_PREFIX, rn=region_name, aid=account_id) | ||
| s3 = sess.client('s3') | ||
| response = s3.list_buckets() | ||
| buckets = [b['Name'] for b in response["Buckets"]] | ||
| if not bucket_name in buckets: | ||
| eprint("Default bucket `%s` not found. Creating..." % bucket_name) | ||
| response = s3.create_bucket( | ||
| ACL='bucket-owner-full-control', | ||
| Bucket=bucket_name, | ||
| CreateBucketConfiguration={ | ||
| 'LocationConstraint': region_name, | ||
| }, | ||
| ) | ||
| eprint(response) | ||
| else: | ||
| eprint("Default bucket `%s` already exists. Skipping creation." % bucket_name) | ||
| return bucket_name | ||
|
|
||
| def _upload_s3(local_model_path, bucket, prefix): | ||
| """ | ||
|
|
@@ -236,28 +291,24 @@ def _upload_s3(local_model_path, bucket, prefix): | |
| Tagging={'TagSet': [{'Key': 'SageMaker', 'Value': 'true'}, ]} | ||
| ) | ||
| eprint('tag response', response) | ||
| return '{}/{}/{}'.format(s3.meta.endpoint_url, bucket, key) | ||
|
|
||
| return "/".join(map(lambda x: str(x).rstrip('/'), [s3.meta.endpoint_url, bucket, key])) | ||
|
|
||
| def _deploy(role, image, app_name, model_s3_path, run_id, region_name): | ||
| def _deploy(role, image_url, app_name, model_s3_path, run_id, region_name): | ||
| """ | ||
| Deploy model on sagemaker. | ||
| :param role: SageMaker execution ARN role | ||
| :param image: Name of the Docker image the model is being deployed into | ||
| :param image_url: URL of the ECR-hosted docker image the model is being deployed into | ||
| :param app_name: Name of the deployed app | ||
| :param model_s3_path: s3 path where we stored the model artifacts | ||
| :param run_id: RunId that generated this model | ||
| """ | ||
| sage_client = boto3.client('sagemaker', region_name=region_name) | ||
| ecr_client = boto3.client("ecr") | ||
| repository_conf = ecr_client.describe_repositories( | ||
| repositoryNames=[image])['repositories'][0] | ||
| model_name = app_name + '-model' | ||
| model_response = sage_client.create_model( | ||
| ModelName=model_name, | ||
| PrimaryContainer={ | ||
| 'ContainerHostname': 'mlflow-serve-%s' % model_name, | ||
| 'Image': repository_conf["repositoryUri"], | ||
| 'Image': image_url, | ||
| 'ModelDataUrl': model_s3_path, | ||
| 'Environment': {}, | ||
| }, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -66,6 +66,9 @@ def _serve(): | |
| print("activating custom environment") | ||
| env = conf[pyfunc.ENV] | ||
| env_path_dst = os.path.join("/opt/mlflow/", env) | ||
| env_path_dst_dir = os.path.dirname(env_path_dst) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tomasatdatabricks do you know why this might have been added?
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because we were attempting to copy files to a destination directory that may not exist; ran into an error during copying due to this. There may be an easier way to force directory creation by adding a flag to the copy command. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it looks reasonable to me. Btw, I fix this in my pending PR too so we might need to merge it at some point. |
||
| if not os.path.exists(env_path_dst_dir): | ||
| os.makedirs(env_path_dst_dir) | ||
| # /opt/ml/ is read-only, we need to copy the env elsewhere before importing it | ||
| shutil.copy(src=os.path.join("/opt/ml/model/", env), dst=env_path_dst) | ||
| os.system("conda env create -n custom_env -f {}".format(env_path_dst)) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: we want to change this to the databricks prod account in submitting the PR to mlflow master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should tag the image with the current mlflow version instead of latest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to clarify, so for master, we'd make it the version of the next release, right? is that 0.2.2 now? @tomasatdatabricks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make most sense for this to point to the current version of MLflow - i.e. mlflow.version.VERSION. That way we do not have to be managing updating versions on multiple places.