Basic Examples

Upload to S3 and run COPY command with YAML

  • no compression or splitting

  • using ~/.aws/credentials file

import pg8000 # can also be psycopg2
import locopy

create_sql = "CREATE TABLE schema.table (variable VARCHAR(20)) DISTKEY(variable)"

with locopy.Redshift(
    dbapi=pg8000,
    config_yaml="example.yaml",
    profile="aws_profile") as redshift:

    redshift.execute(create_sql)
    redshift.load_and_copy(
        local_file="example_data.csv",
        s3_bucket="my_s3_bucket",
        table_name="schema.table",
        delim=",",
        compress=False)
    redshift.execute("SELECT * FROM schema.table")
    res = redshift.cursor.fetchall()
print(res)

Upload to S3 and run COPY command without YAML

Identical to the above code, but we would explicitly pass the connection details in the Redshift constructor vs. the YAML

with locopy.Redshift(
    dbapi=pg8000,
    host="my.redshift.cluster.com",
    port=5439,
    database="db",
    user="userid",
    password="password",
    profile="aws_profile") as redshift:
    ...

AWS tokens as environment variables

If you would rather provide your AWS tokens via environment variables vs. using a profile from ~/.aws/credentials, you can do something as follows:

export AWS_ACCESS_KEY_ID=MY_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=MY_SECRET_ACCESS_KEY
export AWS_SESSION_TOKEN=MY_SESSION_TOKEN

Extra parameters on the COPY job

As per the AWS documentation here, there may be times when you want to tweak the options used by the COPY job if you have some oddities in your data. Locopy assigns a few options by default (DATEFORMAT 'auto', COMPUPDATE ON, and TRUNCATECOLUMNS). If you want to specify other options, or override these three, you can pass in a list of strings which will tweak your load:

redshift.load_and_copy(local_file="example_data.csv",
            s3_bucket="my_s3_bucket",
            table_name="schema.table",
            delim=",",
            copy_options=["NULL AS 'NULL'", "ESCAPE"])