AWS S3
Source and destination
Polytomic offers the following methods for connecting to S3:
- AWS Access Key ID and Secret
- AWS IAM role
Each method is covered in its respective section below.
Connecting with an AWS Access Key ID and Secret
- In Polytomic, go to Connections → Add Connection → S3.
- For Authentication method, select Access Key and Secret.
- Enter the following information:
-
AWS Access ID.
-
AWS Secret Access Key.
-
S3 bucket region (e.g.
us-west-1
). -
S3 bucket name.
The S3 bucket name may contain an optional path which will limit access to a subset of the bucket. For example, the bucket name
output/customers
will limit Polytomic to thecustomers
directory in theoutput
bucket.
- Click Save.
Connecting with an AWS IAM Role
- In Polytomic, go to Connections → Add Connection → S3.
- For Authentication method, select IAM role.
- Enter values for the following fields:
- IAM Role ARN.
- S3 bucket region (e.g.
us-west-1
). - S3 bucket name.
The S3 bucket name may contain an optional path which will limit access to a subset of the bucket. For example, the bucket nameoutput/customers
will limit Polytomic to thecustomers
directory in theoutput
bucket.
- Click Save.
Getting Around IAM Conditions that Restrict IP Addresses
If you use explicit IAM conditions based on IP addresses, you must also add a condition to allow our VPC endpoint vpce-09e3bfdd1f91f0f84
. For example:
{
"Version": "2012-10-17",
"Statement": {
"Effect": "Deny",
"Action": "*",
"Resource": "*",
"Condition": {
"NotIpAddress": {
"aws:SourceIp": [
"192.0.2.0/24",
"203.0.113.0/24"
]
},
"StringNotLikeIfExists": {
"aws:SourceVPCe": [
"vpce-09e3bfdd1f91f0f84"
]
}
}
}
}
S3 Permissions
Polytomic requires the following permissions on S3 buckets and their contents:
s3:ReplicateObject
s3:PutObject
s3:GetObject
s3:ListBucket
s3:DeleteObject
For example, a valid IAM policy for a bucket syncoutput
would be as follows.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "PolytomicBucket",
"Effect": "Allow",
"Action": [
"s3:ReplicateObject",
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::syncoutput/*",
"arn:aws:s3:::syncoutput"
]
}
]
}
Concatenating multiple CSV or JSON files into one SQL table
When using Polytomic's Bulk Sync functionality to sync from S3, you have the option of having Polytomic concatenate all CSV or JSON files in your bucket into one table in your data warehouse. You can do by turning on the Files are time-based snapshots setting in your connection configuration:
Once you turn on this setting, you will also need to specify these settings:
- Collection name: This will be the name of the resulting SQL table in your data warehouse.
- File format: Instructs Polytomic to either concatenate all CSV files in the bucket or all JSON files.
- Skip first lines: If your CSVs have lines at the top that need to be skipped before getting to the headers for your data, you can specify the number of lines Polytomic should skip in this field.
Updated 9 months ago