AWS S3

Source and destination

Polytomic offers the following methods for connecting to S3:

  • AWS Access Key ID and Secret
  • AWS IAM role

Each method is covered in its respective section below.

Connecting with an AWS Access Key ID and Secret

  1. In Polytomic, go to ConnectionsAdd ConnectionS3.
  2. For Authentication method, select Access Key and Secret.
Connecting to S3 with Access Key ID and Secret
  1. Enter the following information:
  • AWS Access ID.

  • AWS Secret Access Key.

  • S3 bucket region (e.g. us-west-1).

  • S3 bucket name.

    The S3 bucket name may contain an optional path which will limit access to a subset of the bucket. For example, the bucket name output/customers will limit Polytomic to the customers directory in the output bucket.

  1. Click Save.

Connecting with an AWS IAM Role

  1. In Polytomic, go to ConnectionsAdd ConnectionS3.
  2. For Authentication method, select IAM role.
  1. Enter values for the following fields:
  • IAM Role ARN.
  • S3 bucket region (e.g. us-west-1).
  • S3 bucket name.
    The S3 bucket name may contain an optional path which will limit access to a subset of the bucket. For example, the bucket name output/customers will limit Polytomic to the customers directory in the output bucket.
  1. Click Save.

Getting Around IAM Conditions that Restrict IP Addresses

If you use explicit IAM conditions based on IP addresses, you must also add a condition to allow our VPC endpoint vpce-09e3bfdd1f91f0f84. For example:

{
    "Version": "2012-10-17",
    "Statement": {
        "Effect": "Deny",
        "Action": "*",
        "Resource": "*",
        "Condition": {
            "NotIpAddress": {
                "aws:SourceIp": [
                    "192.0.2.0/24",
                    "203.0.113.0/24"
                ]
            },
            "StringNotLikeIfExists": {
                "aws:SourceVPCe": [
                    "vpce-09e3bfdd1f91f0f84"
                ]
            }
        }
    }
} 

S3 Permissions

Polytomic requires the following permissions on S3 buckets and their contents:

  • s3:ReplicateObject
  • s3:PutObject
  • s3:GetObject
  • s3:ListBucket
  • s3:DeleteObject

For example, a valid IAM policy for a bucket syncoutput would be as follows.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PolytomicBucket",
            "Effect": "Allow",
            "Action": [
                "s3:ReplicateObject",
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::syncoutput/*",
                "arn:aws:s3:::syncoutput"
            ]
        }
    ]
}

Concatenating multiple CSV or JSON files into one SQL table

When using Polytomic's Bulk Sync functionality to sync from S3, you have the option of having Polytomic concatenate all CSV or JSON files in your bucket into one table in your data warehouse. You can do by turning on the Files are time-based snapshots setting in your connection configuration:

Once you turn on this setting, you will also need to specify these settings:

  • Collection name: This will be the name of the resulting SQL table in your data warehouse.
  • File format: Instructs Polytomic to either concatenate all CSV files in the bucket or all JSON files.
  • Skip first lines: If your CSVs have lines at the top that need to be skipped before getting to the headers for your data, you can specify the number of lines Polytomic should skip in this field.