Base Validators

Build model for dataset by identifying col type along with its respective params.

dataprofiler.validators.base_validators.is_in_range(x: Union[float, int], config: dict) bool

Check to see x is in the range of the config.

Parameters
  • x (int/float) – number

  • config (dict) – configuration

Returns

bool

dataprofiler.validators.base_validators.is_in_list(x: str, config: dict) bool

Check to see x is in the config list.

Parameters
  • x (string) – item

  • config (dict) – configuration

Returns

bool

class dataprofiler.validators.base_validators.Validator

Bases: object

For validating a data set.

Initialize Validator object.

validate(data: Union[pd.DataFrame, dd.DataFrame], config: dict) None

Validate a data set.

No option for validating a partial data set.

Set configuration on run not on instantiation of the class such that you have the option to run multiple times with different configurations without having to also reinstantiate the class.

Parameters
  • data (DataFrame Dask/Pandas) – The data to be processed by the validator. Processing occurs in a column-wise fashion.

  • config (dict) – configuration for how the validator should run across the given data. Validator will only run over columns specified in the configuration.

Example

This is an example of the config:

config = {
        <column_name>: {
                range: {
                    'start': 1,
                    'end':2
                },
                list: [1,2,3]
            }
        }
get() dict

Get the results of the validation run.