API Reference

giraffez.cmd

class giraffez.cmd.TeradataCmd(host=None, username=None, password=None, log_level=0, config=None, key_file=None, dsn=None, protect=False, silent=False, panic=True)

The class for connecting to Teradata and executing commands and queries using CLIv2.

Exposed under the alias giraffez.Cmd.

For large-output queries, giraffez.BulkExport should be used.

Parameters
  • host (str) – Omit to read from ~/.girafferc configuration file.

  • username (str) – Omit to read from ~/.girafferc configuration file.

  • password (str) – Omit to read from ~/.girafferc configuration file.

  • log_level (int) – Specify the desired level of output from the job. Possible values are giraffez.SILENCE giraffez.INFO (default), giraffez.VERBOSE and giraffez.DEBUG

  • config (str) – Specify an alternate configuration file to be read from, when previous paramaters are omitted.

  • key_file (str) – Specify an alternate key file to use for configuration decryption

  • dsn (string) – Specify a connection name from the configuration file to be used, in place of the default.

  • protect (bool) – If authentication with Teradata fails and protect is True locks the connection used in the configuration file. This can be unlocked using the command giraffez config --unlock <connection> changing the connection password, or via the unlock_connection() method.

  • silent (string) – Suppress log output. Used internally only.

  • panic (bool) – If True, when an error is encountered it will be raised.

Raises

Meant to be used, where possible, with python’s with context handler to guarantee that connections will be closed gracefully when operation is complete:

with giraffez.Cmd() as cmd:
    results = cmd.execute('select * from dbc.dbcinfo')
    # continue executing statements and processing results

Using the with context ensures proper exit-handling and disconnection.

execute(command, coerce_floats=True, parse_dates=False, header=False, sanitize=True, silent=False, panic=None, multi_statement=False, prepare_only=False)

Execute commands using CLIv2.

Parameters
  • command (str) – The SQL command to be executed

  • coerce_floats (bool) – Coerce Teradata decimal types into Python floats

  • parse_dates (bool) – Parses Teradata datetime types into Python datetimes

  • header (bool) – Include row header

  • sanitize (bool) – Whether or not to call prepare_statement() on the command

  • silent (bool) – Silence console logging (within this function only)

  • panic (bool) – If True, when an error is encountered it will be raised.

  • multi_statement (bool) – Execute in multi-statement mode

  • prepare_only (bool) – Only prepare the command (no results)

Returns

a cursor over the results of each statement in the command

Return type

Cursor

Raises
exists(object_name, silent=False)

Check that object (table or view) object_name exists, by executing a show table object_name query, followed by a show view object_name query if object_name is not a table.

Parameters
  • object_name (str) – The name of the object to check for existence.

  • silent (bool) – Silence console logging (within this function only)

Returns

True if the object exists, False otherwise.

Return type

bool

fetch_columns(table_name, silent=False)

Return the column information for table_name by executing a select top 1 * from table_name query.

Parameters
  • table_name (str) – The fully-qualified name of the table to retrieve schema for

  • silent (bool) – Silence console logging (within this function only)

Returns

the columns of the table

Return type

Columns

insert(table_name, rows, fields=None, delimiter=None, null='NULL', parse_dates=False, quotechar='"')

Load a text file into the specified table_name or Insert Python list rows into the specified table_name

Parameters
  • table_name (str) – The name of the destination table

  • rows (list/str) – A list of rows or the name of an input file. Each row must be a list of field values.

  • fields (list) – The names of the target fields, in the order that the data will be presented (defaults to None for all columns in the table).

  • delimiter (str) – The delimiter used by the input file (or None to infer it from the header).

  • null (str) – The string used to indicated nulled values in the file (defaults to 'NULL').

  • quotechar (str) – The character used to quote fields containing special characters, like the delimiter.

  • parse_dates (bool) – If True, attempts to coerce date fields into a standard format (defaults to False).

Raises
Returns

A dictionary containing counts of applied rows and errors

Return type

dict

For most insertions, this will be faster and produce less strain on Teradata than using TeradataBulkLoad (giraffez.BulkLoad).

Requires that any input file be a properly delimited text file, with a header that corresponds to the target fields for insertion. Valid delimiters include ‘|’, ‘,’, and <tab> or a properly encoded JSON stream.

class giraffez.cmd.Cursor(conn, command, multi_statement=False, header=False, prepare_only=False, coerce_floats=True, parse_dates=False, panic=True)

The class returned by giraffez.Cmd.execute for iterating through Terdata CLIv2 results.

Parameters
  • conn (giraffez.Cmd) – The underlying database connection

  • command (str) – The SQL command to be executed

  • multi_statement (bool) – Execute in parallel statement mode

  • header (bool) – If True, yields a list of column names between statements

  • prepare_only (bool) – Execute in prepare mode

  • coerce_floats (bool) – Coerce Teradata Decimal types automatically into Python floats

  • parse_dates (bool) – Returns date/time types as giraffez date/time types (instead of Python strings)

readall()

Exhausts the current connection by iterating over all rows and returning the total.

with giraffez.Cmd() as cmd:
    results = cmd.execute("select * from dbc.dbcinfo")
    print(results.readall())
to_dict()

Sets the current encoder output to Python dict and returns the cursor. This makes it possible to set the output encoding and iterate over the results:

with giraffez.Cmd() as cmd:
    for row in cmd.execute(query).to_dict():
        print(row)

Or can be passed as a parameter to an object that consumes an iterator:

result = cmd.execute(query)
list(result.to_dict())
to_list()

Set the current encoder output to giraffez.Row objects and returns the cursor. This is the default value so it is not necessary to select this unless the encoder settings have been changed already.

giraffez.config

class giraffez.config.Config(conf=None, mode='r', key_file=None)

An object for reading, writing, and encrypting giraffez configuration files (YAML files). Uses an encryption key (required) to encrypt and decrypt values keyed as ‘password’, ‘pass’, ‘apikey’, or ‘key’, as well as any values nested under the key ‘secure’ (e.g. setting ‘secure.teradata.user’ will encrypt the value.)

Implements the __enter__ and __exit__ magic methods so that it can be used by Python’s with context-handler:

with Config() as conf:
    conn = conf.get_connection() # get default connection
username = conn.get('username')
password = conn.get('password')
Parameters
  • conf (str) – A path to the configuration file to open. Defaults to ~/.girafferc

  • mode (str) – Defaults to ‘r’ for read-only. If not changed, configuration will not be writeable while open.

  • key_file (str) – A path to the key file to open. Defaults to ~/.giraffepg

Raises

giraffez.errors.KeyNotFound – if default key_file is not found and a valid one is not provided.

property connections

Return a dict of connections from the configuration settings.

Raises

giraffez.errors.ConfigurationError – if connections are not present

get_connection(dsn=None)

Retrieve a connection by the given dsn, or the default connection.

Parameters

dsn (str) – The name of the connection to retrieve. Defaults to None, which retrieves the default connection.

Returns

A dict of connection settings

Raises
get_value(key, default={}, nested=True, decrypt=True)

Retrieve a value from the configuration based on its key. The key may be nested.

Parameters
  • key (str) – A path to the value, with nested levels joined by ‘.’

  • default – Value to return if the key does not exist (defaults to dict())

  • decrypt (bool) – If True, decrypt an encrypted value before returning (if encrypted). Defaults to True.

list_value(decrypt=False)

Return the contents of the configuration as a dict. Depending on the structure of the YAML settings, the return value may contain nested dict objects.

Parameters

decrypt (bool) – If True, decrypt the contents before returning.

Returns

(potentially) nested dict of keys and values, as parsed from the configuration file YAML contents.

classmethod lock_connection(conf, dsn, key=None)

A class method to lock a connection (given by dsn) in the specified configuration file. Automatically opens the file and writes to it before closing.

Parameters
  • conf (str) – The configuration file to modify

  • dsn (str) – The name of the connection to lock

Raises

giraffez.errors.ConfigurationError – if the connection does not exist

reload()

Re-open and read settings from file.

set_value(key, value)

Set a value within the configuration based on its key. The key may be nested, any nested levels that do not exist prior to the final segment of the key path will be created. Note: In order to write changes to the file, ensure that write() is called prior to exit.

Parameters
  • key (str) – A path to the value destination, with nested levels joined by ‘.’

  • value – Value to set at the given key, can be any value that is YAML serializeable.

classmethod unlock_connection(conf, dsn, key=None)

A class method to unlock a connection (given by dsn) in the specified configuration file. Automatically opens the file and writes to it before closing.

Parameters
  • conf (str) – The configuration file to modify

  • dsn (str) – The name of the connection to unlock

Raises

giraffez.errors.ConfigurationError – if the connection does not exist

unset_value(key)

Remove a value at the given key – and any nested values – from the configuration. Note: In order to write changes to the file, ensure that write() is called prior to exit.

Parameters

key (str) – A path to the value destination, with nested levels joined by ‘.’

Raises

giraffez.errors.ConfigurationError – if the key specifies an invalid path, or does not exist

write(settings=None)

Save the current configuration to its file (as given by self._config_file). Optionally, settings may be passed in to override the current settings before writing. Returns None if the file could not be written to, either due to permissions, or if the Config object has the mode ‘r’.

Parameters

settings (dict) – Defaults to None, if not None this will replace self.settings prior to writing to the file

classmethod write_default(conf=None)

A class method to write a default configuration file structure to a file. Note that the contents of the file will be overwritten if it already exists.

Parameters

conf (str) – The name of the file to write to. Defaults to None, for ~/.girafferc

Returns

The content written to the file

Return type

str

giraffez.encoders

class giraffez.encoders.CharHandler(columns)
class giraffez.encoders.DateHandler(columns)
class giraffez.encoders.DecimalHandler(columns)
class giraffez.encoders.FloatHandler(columns)
class giraffez.encoders.Handler(columns)

Base class for creating data handlers. This class can be subclassed to define a list of functions that run on data that matches the Teradata data types provided. This is useful when dealing with data that may not be in the appropriate Python type to be encoded.

class giraffez.encoders.TeradataEncoder(columns=[], encoding=None)

The class wrapping the Teradata C encoder.

Exposed under the alias giraffez.Encoder.

:param list or giraffez.Columns columns: Columns used for encoding. :param int encoding: Constant value representing the settings used by underlying C encoder.

giraffez.errors

exception giraffez.errors.ConfigNotFound

Raised when the specified configuration file does not exist.

exception giraffez.errors.ConfigReadOnly

Raised when a write is attempted on a configuration file was opened in read mode.

exception giraffez.errors.ConfigurationError

For use with configuration file handling.

exception giraffez.errors.ConnectionLock(dsn)

Raised when connection is locked by invalid attempts and the ‘protect’ feature is being used.

exception giraffez.errors.FileNotFound

Raised when file does not exist.

exception giraffez.errors.GiraffeEncodeError

Raised when unable to encode the provided object.

exception giraffez.errors.GiraffeError

Baseclass for all giraffez errors.

exception giraffez.errors.GiraffeTypeError

Baseclass for all giraffez type errors.

exception giraffez.errors.InvalidCredentialsError

Raised when connection credentials are incorrect.

exception giraffez.errors.KeyNotFound

Raised when the specified configuration file does not exist.

giraffez.encrypt

giraffez.encrypt.create_key_file(path)

Creates a new encryption key in the path provided and sets the file permissions. Setting the file permissions currently does not work on Windows platforms because of the differences in how file permissions are read and modified.

giraffez.export

class giraffez.export.TeradataBulkExport(query=None, host=None, username=None, password=None, log_level=0, config=None, key_file=None, dsn=None, protect=False, coerce_floats=True)

The class for using the Teradata Parallel Transport API to quickly export large amounts of data.

Exposed under the alias giraffez.BulkExport.

Parameters
  • query (str) – Either SQL query to execute and return results of, or the name of a table to export in its entirety

  • host (str) – Omit to read from ~/.girafferc configuration file.

  • username (str) – Omit to read from ~/.girafferc configuration file.

  • password (str) – Omit to read from ~/.girafferc configuration file.

  • log_level (int) – Specify the desired level of output from the job. Possible values are giraffez.SILENCE giraffez.INFO (default), giraffez.VERBOSE and giraffez.DEBUG

  • config (str) – Specify an alternate configuration file to be read from, when previous paramters are omitted.

  • key_file (str) – Specify an alternate key file to use for configuration decryption

  • dsn (string) – Specify a connection name from the configuration file to be used, in place of the default.

  • protect (bool) – If authentication with Teradata fails and protect is True locks the connection used in the configuration file. This can be unlocked using the command giraffez config --unlock <connection> changing the connection password, or via the unlock_connection() method.

  • coerce_floats (bool) – Coerce Teradata decimal types into Python floats

Raises

Meant to be used, where possible, with Python’s with context handler to guarantee that connections will be closed gracefully when operation is complete:

with giraffez.BulkExport('dbc.dbcinfo') as export:
    print('|'.join(export.columns.names))
    for row in export.to_list():
        print(row)
property columns

Retrieve columns if necessary, and return them.

Return type

Columns

idle_time = None

The amount of time spent in idle (waiting for server)

property query
Returns

The current query if it has been set, otherwise None

Return type

str

to_archive(writer)

Writes export archive files in the Giraffez archive format. This takes a giraffez.io.Writer and writes archive chunks to file until all rows for a given statement have been exhausted.

with giraffez.BulkExport("database.table_name") as export:
    with giraffez.Writer("database.table_name.tar.gz", 'wb', use_gzip=True) as out:
        for n in export.to_archive(out):
            print("Rows: {}".format(n))
Parameters

writer (giraffez.io.Writer) – A writer handling the archive output

Return type

iterator (yields int)

to_dict()

Sets the current encoder output to Python dict and returns a row iterator.

Return type

iterator (yields dict)

to_json()

Sets the current encoder output to json encoded strings and returns a row iterator.

Return type

iterator (yields str)

to_list()

Sets the current encoder output to Python list and returns a row iterator.

Return type

iterator (yields list)

to_str(delimiter='|', null='NULL')

Sets the current encoder output to Python str and returns a row iterator.

Parameters
  • null (str) – The string representation of null values

  • delimiter (str) – The string delimiting values in the output string

Return type

iterator (yields str)

giraffez.load

class giraffez.load.TeradataBulkLoad(table=None, host=None, username=None, password=None, log_level=0, config=None, key_file=None, dsn=None, protect=False, coerce_floats=False, cleanup=False, print_error_table=False)

The class for using the TPT API’s UPDATE (MLoad) driver to insert a large (> ~100k rows) amount of data into an existing Teradata table.

Exposed under the alias giraffez.BulkLoad.

Parameters
  • table (str) – The name of the target table for loading.

  • host (str) – Omit to read from ~/.girafferc configuration file.

  • username (str) – Omit to read from ~/.girafferc configuration file.

  • password (str) – Omit to read from ~/.girafferc configuration file.

  • log_level (int) – Specify the desired level of output from the job. Possible values are giraffez.SILENCE, giraffez.INFO (default), giraffez.VERBOSE, and giraffez.DEBUG

  • config (str) – Specify an alternate configuration file to be read from, when previous paramaters are omitted.

  • key_file (str) – Specify an alternate key file to use for configuration decryption

  • dsn (string) – Specify a connection name from the configuration file to be used, in place of the default.

  • protect (bool) – If authentication with Teradata fails and protect is True, locks the connection used in the configuration file. This can be unlocked using the command giraffez config --unlock <connection>, changing the connection password, or via the unlock_connection() method.

  • coerce_floats (bool) – Coerce Teradata decimal types into Python floats

  • cleanup (bool) – Attempt to cleanup all work tables when context exits.

  • print_error_table (bool) – Prints a user-friendly version of the mload error table to stderr.

Raises

If the target table is currently under an MLoad lock (such as if the previous operation failed), a release mload statement will be executed on the table, and the load job will be re-attempted.

Meant to be used, where possible, with python’s with context handler to guarantee that connections will be closed gracefully when operation is complete.

checkpoint()

Execute a checkpoint while loading rows. Called automatically when loading from a file. Updates the exit code of the driver to reflect errors.

cleanup()

Drops any existing work tables, as returned by tables().

Raises

giraffez.TeradataPTError – if a Teradata error ocurred

property columns

The list of columns in use.

Getter

Return the list of columns in use.

Setter

Set the columns to be loaded into, as well as their order. If loading from a file, these will be determined from the file header. Not necessary if you are loading into all columns, in the original order. The value must be a list of names in the order that the fields of data will be presented in each row.

Raises GiraffeError if field_names is not a list.

Raises GiraffeError if the target table has not been set.

Type

Columns

finish()

Finishes the load job. Called automatically when the connection closes.

Returns

The exit code returned when applying rows to the table

from_file(filename, table=None, delimiter='|', null='NULL', panic=True, quotechar='"', parse_dates=False)

Load from a file into the target table, handling each step of the load process.

Can load from text files, and properly formatted giraffez archive files. In both cases, if Gzip compression is detected the file will be decompressed while reading and handled appropriately. The encoding is determined automatically by the contents of the file.

It is not necessary to set the columns in use prior to loading from a file. In the case of a text file, the header is used to determine column names and their order. Valid delimiters include ‘|’, ‘,’, and ‘t’ (tab). When loading an archive file, the column information is decoded alongside the data.

Parameters
  • filename (str) – The location of the file to be loaded

  • table (str) – The name of the target table, if it was not specified to the constructor for the isntance

  • null (str) – The string that indicates a null value in the rows being inserted from a file. Defaults to ‘NULL’

  • delimiter (str) – When loading a file, indicates that fields are separated by this delimiter. Defaults to None, which causes the delimiter to be determined from the header of the file. In most cases, this behavior is sufficient

  • quotechar (str) – The character used to quote fields containing special characters, like the delimiter.

  • panic (bool) – If True, when an error is encountered it will be raised. Otherwise, the error will be logged and self.error_count is incremented.

Returns

The output of the call to finish()

Raises
idle_time = None

The amount of time spent in idle (waiting for server)

print_error_table = None

Prints the error table when there is an issue, good for troubleshooting jobs

put(items, panic=True)

Load a single row into the target table.

Parameters
  • items (list) – A list of values in the row corresponding to the fields specified by self.columns

  • panic (bool) – If True, when an error is encountered it will be raised. Otherwise, the error will be logged and self.error_count is incremented.

Raises
release()

Attempt release of target mload table.

Raises

giraffez.errors.GiraffeError – if table was not set by the constructor, the TeradataBulkLoad.table, or from_file().

property table

The name of the target table.

Getter

Returns the name of the target table, or None if it has not been set.

Setter

Set the name of the target table, if the table name was not given to the constructor of the TeradataBulkLoad instance or from_file(). The value given must include all qualifiers such as database name.

Raises GiraffeError if the MLoad connection has already been initiated, or the TeradataCmd connection cannot be established.

Raises TeradataPTError if the column data could not be retrieved from Teradata

Type

str

property tables

The names of the work tables used for loading.

Returns

A list of four tables, each the name of the target table with the added suffixes, “_wt”, “_log”, “_e1”, and “_e2”

Raises

giraffez.errors.GiraffeError – if table was not set by the constructor, the TeradataBulkLoad.table, or from_file().

property total_count

The number of rows applied, plus the number of rows in error.

giraffez.secret

class giraffez.secret.Secret(conf=None, mode='r', key_file=None)

An object for reading and modifying encrypted values stored in the giraffez configuration files. This provides and easier-to-use abstraction for handling sensitive information than using the giraffez.config.Config class directly.

Parameters
  • conf (str) – A path to the configuration file to open. Defaults to ~/.girafferc

  • mode (str) – Defaults to ‘r’ for read-only. If not changed, configuration will not be writeable while open.

  • key_file (str) – A path to the key file to open. Defaults to ~/.giraffepg

get(key)

Retrieve the decrypted value of a key in a giraffez configuration file.

Parameters

key (str) – The key used to lookup the encrypted value

set(key, value)

Set a decrypted value by key in a giraffez configuration file.

Parameters
  • key (str) – The key used to lookup the encrypted value

  • value – Value to set at the given key, can be any value that is YAML serializeable.

giraffez.types

class giraffez.types.Column(column, filler=False)

An object containing the column information used by Teradata for encoding, decoding, and identifying data.

Implements the __eq__ magic method for comparison with other instances (validating that each field holds the same data), and the __str__ and __repr__ magic methods for coersion to string values and more pleasant visual presentation.

Fields:

name

The original name of the column in the table it belongs to

alias

The alias of the column in the query

title

The title of the column in the query

type

Numeric value used by Teradata to indicate column type

length

Length in bytes that the field’s values occupy

precision

Precision of floating-point numeric columns

scale

Scale of floating-point numeric columns

filler

Indicator that the column belongs to the target table, but will not be used in the current operation

class giraffez.types.Columns(items=[])

A set of Column objects, used to convey original table schema without losing information.

classmethod deserialize(data)

Deserializes giraffez Archive header. See serialize() for more information.

Parameters

data (str) – data in giraffez Archive format, to be deserialized

Returns

Columns object decoded from data

get(column_name)

Retrieve a column from the list with name value column_name

Parameters

column_name (str) – The name of the column to get

Returns

Column with the specified name, or None if it does not exist.

property names
Returns

The names of the contained Column objects

Return type

list

serialize()

Serializes the columns into the giraffez archive header binary format:

0      1      2
+------+------+------+------+------+------+------+------+
| Header      | Header Data                             |
| Length      |                                         |
+------+------+------+------+------+------+------+------+

           giraffez Archive Header Format

                       Fig. 1

Header Length: 2 bytes
    Full length of archive header 

Header Data: variable
    Binary data representing N column(s) using the format
    specified in Fig. 2


0      1      2      3      4      5      6      7      8
+------+------+------+------+------+------+------+------+
| Type        | Length      | Precision   | Scale       | 
+------+------+------+------+------+------+------+------+
| Name Length | Name                                    |
+------+------+------+------+------+------+------+------+

               Binary Column Format

                       Fig. 2

Type: 2 bytes
    Numerical representation of column type defined by
    Teradata's CLIv2

Length: 2 bytes
    Column length

Scale: 2 bytes
    Column scale

Precision: 2 bytes
    Column precision

Name Length: 2 bytes
    Length used for reading variable column name

Name: variable
    Name of column
Returns

Packed binary data, representing the serialized Columns

Return type

str

set_filter(names=None)

Set the names of columns to be used when iterating through the list, retrieving names, etc.

Parameters

names (list) – A list of names to be used, or None for all

class giraffez.types.Date(*args, **kwargs)

Ensures that datetime objects can be proper coerced into a string value of a given format.

There is a Python bug (yes, it is a bug, not a feature) where datetime objects representing any dates before 1900 raises an error when calling the strftime method. This will ensure that the date is coerced safely with str() into something Teradata can use.

class giraffez.types.Decimal
class giraffez.types.Time

Represents Teradata time data types such as TIME(n).

class giraffez.types.Timestamp(*args, **kwargs)

Represents Teradata date/time data types such as TIMESTAMP(n).

class giraffez.types.Row(columns, row)

A wrapper for a single row object.

Defines the __iter__ magic method for convenience:

for item in row:
    print(item)

Defines the __getattr__ magic method to return a particular field:

print(row.first_name) # 'alice'

Defines the __getitem__ magic method to return a particular field by name or numeric index:

print(row['first_name']) # 'alice'
print(row['id']) # 'abc123'
print(row[2]) # 'abc123'
items()

Represents the contents of the row as a dict with the column names as keys, and the row’s fields as values.

Return type

dict