locopy.utility module¶

Utility Module.

Module which utility functions for use within the application.

class locopy.utility.ProgressPercentage(filename)[source]¶

Bases: object

ProgressPercentage class is used by the S3Transfer upload_file callback.

Please see the following url for more information: http://boto3.readthedocs.org/en/latest/reference/customizations/s3.html#ref-s3transfer-usage.

locopy.utility.compress_file(input_file, output_file)[source]¶

Compresses a file (gzip).

Parameters:

input_file (str) – Path to input file to compress
output_file (str) – Path to write the compressed file

locopy.utility.compress_file_list(file_list)[source]¶

Compresses a list of files (gzip) and clean up the old files.

Parameters:: file_list (list) – List of strings with the file paths of the files to compress
Returns:: List of strings with the file paths of the compressed files (original file name with gz appended)
Return type:: list

locopy.utility.concatenate_files(input_list, output_file, remove=True)[source]¶

Concatenate a list of files into one file.

Parameters:

input_list (list) – List of strings with the paths to input files to concateneate
output_file (str) – Path of the output file
remove (bool, optional) – Removes the files from the input list if True. Defaults to True

Raises:

LocopyConcatError – If input_list or there is a issue while concatenating the files into one

locopy.utility.find_column_type(dataframe, warehouse_type: str)[source]¶
locopy.utility.find_column_type(dataframe: DataFrame, warehouse_type: str)
locopy.utility.find_column_type(dataframe: DataFrame, warehouse_type: str): Find data type of each column from the dataframe.

locopy.utility.find_column_type_pandas(dataframe: DataFrame, warehouse_type: str)[source]¶

Find data type of each column from the dataframe.

Following is the list of pandas data types that the function checks and their mapping in sql:

bool/pd.BooleanDtype -> boolean

datetime64[ns, <tz>] -> timestamp

M8[ns] -> timestamp

int/pd.Int64Dtype -> int

float/pd.Float64Dtype -> float

float object -> float

datetime object -> timestamp

object/pd.StringDtype -> varchar

For all other data types, the column will be mapped to varchar type.

Parameters:

dataframe (Pandas dataframe)
warehouse_type (str) – Required to properly determine format of uploaded data, either “snowflake” or “redshift”.

Returns:

A dictionary of columns with their data type

Return type:

dict

locopy.utility.find_column_type_polars(dataframe: DataFrame, warehouse_type: str)[source]¶

Find data type of each column from the dataframe.

Following is the list of polars data types that the function checks and their mapping in sql:

Boolean -> boolean

Date -> date

Datetime/Duration/Timestamp -> timestamp

Time -> time

int -> int

float/decimal -> float

float object -> float

others -> varchar

For all other data types, the column will be mapped to varchar type.

Parameters:

dataframe (Pandas dataframe)
warehouse_type (str) – Required to properly determine format of uploaded data, either “snowflake” or “redshift”.

Returns:

A dictionary of columns with their data type

Return type:

dict

locopy.utility.get_ignoreheader_number(options)[source]¶

Return the number_rows from IGNOREHEADER [ AS ] number_rows.

This doesn’t validate that the AS is valid.

Parameters:: options (A list (str) of copy options that should be appended to the COPY) – statement.
Returns:: The number_rows from IGNOREHEADER [ AS ] number_rows
Return type:: int
Raises:: LocopyIgnoreHeaderError – If more than one IGNOREHEADER is found in the options

locopy.utility.read_config_yaml(config_yaml)[source]¶

Read a configuration YAML file.

Populate the database connection attributes, and validate required ones.

Example:

host: my.redshift.cluster.com
port: 5439
dbname: db
user: userid
password: password

Parameters:: config_yaml (str or file pointer) – String representing the file location of the configuration file, or a pointer to an open file object
Returns:: A dictionary of parameters for setting up a connection to the database.
Return type:: dict
Raises:: CredentialsError – If any connection items are missing from the YAML file

locopy.utility.split_file(input_file, output_file, splits=1, ignore_header=0)[source]¶

Split a file into equal files by lines.

For example: myinputfile.txt will be split into myoutputfile.txt.01 , `myoutputfile.txt.02 etc..

Parameters:

input_file (str) – Path to input file to split
output_file (str) – Name of the output file
splits (int, optional) – Number of splits to perform. Must be greater than zero. Defaults to 1
ignore_header (int, optional) – If ignore_header is > 0 then that number of rows will be removed from the beginning of the files as they are split. Defaults to 0

Returns:

List of strings with the file paths of the split files

Return type:

list

Raises:

LocopySplitError – If splits is less than 1 or some processing error when splitting

locopy.utility.write_file(data, delimiter, filepath, mode='w')[source]¶

Write data to a file.

Parameters:

data (list) – List of lists
delimiter (str) – Delimiter by which columns will be separated
filepath (str) – Location of the output file
mode (str) – File writing mode. Examples include ‘w’ for write or ‘a’ for append. Defaults to write mode. See https://www.tutorialspoint.com/python/python_files_io.htm