IAM role authentication to Databricks AWS staging buckets

If you're on AWS and require authenticating using an IAM role rather than an access key and secret, switch the Authentication Method dropdown to IAM role as shown in this screenshot:


The IAM role must have the following permissions for the staging bucket:

  • s3:GetObject
  • s3:PutObject
  • s3:DeleteObject
  • s3:ListBucket
  • s3:GetBucketLocation

Additionally, the role must be assumable by the Polytomic role arn:aws:iam::568237466542:role/convox/prod-polytomic-ServiceRole-1ELGH39L0GCHT. For on premises ECS deployments the role will be ${prefix}-ecs-task-role. Contact [email protected] for deployment-specific guidance.

A trust policy that allows both Polytomic and Databricks to assume the role will look like the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::568237466542:role/convox/prod-polytomic-ServiceRole-1ELGH39L0GCHT"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "0000"
                }
            }
        },
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "0000"
                }
            }
        }
    ]
}

Note that the values for sts:ExternalId are specific to your Polytomic and Databricks accounts.

In addition to configuring Polytomic, you may also need to configure a Storage Credential which Databricks will use when reading data from the staging bucket. See the Databricks documentation for information on creating storage credentials. If your staging bucket is not configured as an External Location in Databricks, you'll need to provide Polytomic with the name of the Storage Credential in the connection configuration.

The Databricks user Polytomic uses will need the following permissions for the external location:

  • READ FILES
  • WRITE FILES
  • CREATE EXTERNAL TABLE
  • CREATE MANAGED STORAGE

And the following permissions for the storage credential:

  • READ FILES
  • WRITE FILES
  • CREATE EXTERNAL TABLE