Basic Examples¶
Upload to S3 and run COPY command with YAML¶
no compression or splitting
using
~/.aws/credentials
file
import pg8000 # can also be psycopg2
import locopy
create_sql = "CREATE TABLE schema.table (variable VARCHAR(20)) DISTKEY(variable)"
with locopy.Redshift(
dbapi=pg8000,
config_yaml="example.yaml",
profile="aws_profile") as redshift:
redshift.execute(create_sql)
redshift.load_and_copy(
local_file="example_data.csv",
s3_bucket="my_s3_bucket",
table_name="schema.table",
delim=",",
compress=False)
redshift.execute("SELECT * FROM schema.table")
res = redshift.cursor.fetchall()
print(res)
Upload to S3 and run COPY command without YAML¶
Identical to the above code, but we would explicitly pass the connection details in the Redshift
constructor vs. the YAML
with locopy.Redshift(
dbapi=pg8000,
host="my.redshift.cluster.com",
port=5439,
database="db",
user="userid",
password="password",
profile="aws_profile") as redshift:
...
AWS tokens as environment variables¶
If you would rather provide your AWS tokens via environment variables vs. using a profile from
~/.aws/credentials
, you can do something as follows:
export AWS_ACCESS_KEY_ID=MY_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=MY_SECRET_ACCESS_KEY
export AWS_SESSION_TOKEN=MY_SESSION_TOKEN
Extra parameters on the COPY job¶
As per the AWS documentation here,
there may be times when you want to tweak the options used by the COPY
job if you have some
oddities in your data. Locopy assigns a few options by default (DATEFORMAT 'auto'
,
COMPUPDATE ON
, and TRUNCATECOLUMNS
). If you want to specify other options, or override these
three, you can pass in a list of strings which will tweak your load:
redshift.load_and_copy(local_file="example_data.csv",
s3_bucket="my_s3_bucket",
table_name="schema.table",
delim=",",
copy_options=["NULL AS 'NULL'", "ESCAPE"])