locopy.utility module¶
Utility Module.
Module which utility functions for use within the application.
- class locopy.utility.ProgressPercentage(filename)[source]¶
Bases:
object
ProgressPercentage class is used by the S3Transfer upload_file callback.
Please see the following url for more information: http://boto3.readthedocs.org/en/latest/reference/customizations/s3.html#ref-s3transfer-usage.
- locopy.utility.compress_file_list(file_list)[source]¶
Compresses a list of files (gzip) and clean up the old files.
- locopy.utility.concatenate_files(input_list, output_file, remove=True)[source]¶
Concatenate a list of files into one file.
- Parameters:
- Raises:
LocopyConcatError – If
input_list
or there is a issue while concatenating the files into one
- locopy.utility.find_column_type(dataframe, warehouse_type: str)[source]¶
- locopy.utility.find_column_type(dataframe: DataFrame, warehouse_type: str)
- locopy.utility.find_column_type(dataframe: DataFrame, warehouse_type: str)
Find data type of each column from the dataframe.
- locopy.utility.find_column_type_pandas(dataframe: DataFrame, warehouse_type: str)[source]¶
Find data type of each column from the dataframe.
Following is the list of pandas data types that the function checks and their mapping in sql:
bool/pd.BooleanDtype -> boolean
datetime64[ns, <tz>] -> timestamp
M8[ns] -> timestamp
int/pd.Int64Dtype -> int
float/pd.Float64Dtype -> float
float object -> float
datetime object -> timestamp
object/pd.StringDtype -> varchar
For all other data types, the column will be mapped to varchar type.
- locopy.utility.find_column_type_polars(dataframe: DataFrame, warehouse_type: str)[source]¶
Find data type of each column from the dataframe.
Following is the list of polars data types that the function checks and their mapping in sql:
Boolean -> boolean
Date/Datetime/Duration/Time -> timestamp
int -> int
float/decimal -> float
float object -> float
datetime object -> timestamp
others -> varchar
For all other data types, the column will be mapped to varchar type.
- locopy.utility.get_ignoreheader_number(options)[source]¶
Return the
number_rows
fromIGNOREHEADER [ AS ] number_rows
.This doesn’t validate that the
AS
is valid.- Parameters:
options (A list (str) of copy options that should be appended to the COPY) – statement.
- Returns:
The
number_rows
fromIGNOREHEADER [ AS ] number_rows
- Return type:
- Raises:
LocopyIgnoreHeaderError – If more than one IGNOREHEADER is found in the options
- locopy.utility.read_config_yaml(config_yaml)[source]¶
Read a configuration YAML file.
Populate the database connection attributes, and validate required ones.
Example:
host: my.redshift.cluster.com port: 5439 dbname: db user: userid password: password
- Parameters:
config_yaml (str or file pointer) – String representing the file location of the configuration file, or a pointer to an open file object
- Returns:
A dictionary of parameters for setting up a connection to the database.
- Return type:
- Raises:
CredentialsError – If any connection items are missing from the YAML file
- locopy.utility.split_file(input_file, output_file, splits=1, ignore_header=0)[source]¶
Split a file into equal files by lines.
For example:
myinputfile.txt
will be split intomyoutputfile.txt.01
,`myoutputfile.txt.02
etc..- Parameters:
input_file (str) – Path to input file to split
output_file (str) – Name of the output file
splits (int, optional) – Number of splits to perform. Must be greater than zero. Defaults to 1
ignore_header (int, optional) – If
ignore_header
is > 0 then that number of rows will be removed from the beginning of the files as they are split. Defaults to 0
- Returns:
List of strings with the file paths of the split files
- Return type:
- Raises:
LocopySplitError – If
splits
is less than 1 or some processing error when splitting
- locopy.utility.write_file(data, delimiter, filepath, mode='w')[source]¶
Write data to a file.
- Parameters:
data (list) – List of lists
delimiter (str) – Delimiter by which columns will be separated
filepath (str) – Location of the output file
mode (str) – File writing mode. Examples include ‘w’ for write or ‘a’ for append. Defaults to write mode. See https://www.tutorialspoint.com/python/python_files_io.htm