dataprofiler.profilers.profiler_options module

coding=utf-8 Specify the options when running the data profiler.

class dataprofiler.profilers.profiler_options.BaseOption

Bases: object

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.BooleanOption(is_enabled=True)

Bases: dataprofiler.profilers.profiler_options.BaseOption

Boolean option

Variables

is_enabled (bool) – boolean option to enable/disable the option.

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.BaseColumnOptions

Bases: dataprofiler.profilers.profiler_options.BooleanOption

Base options for all the columns.

Variables

is_enabled (bool) – boolean option to enable/disable the column.

is_prop_enabled(prop)

Checks to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.NumericalOptions

Bases: dataprofiler.profilers.profiler_options.BaseColumnOptions

Options for the Numerical Stats Mixin

Variables
  • is_enabled (bool) – boolean option to enable/disable the column.

  • min (BooleanOption) – boolean option to enable/disable min

  • max (BooleanOption) – boolean option to enable/disable max

  • sum (BooleanOption) – boolean option to enable/disable sum

  • variance (BooleanOption) – boolean option to enable/disable variance

  • histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles

  • is_numeric_stats_enabled (bool) – boolean to enable/disable all numeric stats

property is_numeric_stats_enabled

Returns the state of numeric stats being enabled / disabled. If any numeric stats property is enabled it will return True, otherwise it will return False.

Returns

true if any numeric stats property is enabled, otherwise false

Rtype bool

property properties

Includes at least: is_enabled: Turns on or off the column.

is_prop_enabled(prop)

Checks to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.IntOptions

Bases: dataprofiler.profilers.profiler_options.NumericalOptions

Options for the Int Column

Variables
  • is_enabled (bool) – boolean option to enable/disable the column.

  • min (BooleanOption) – boolean option to enable/disable min

  • max (BooleanOption) – boolean option to enable/disable max

  • sum (BooleanOption) – boolean option to enable/disable sum

  • variance (BooleanOption) – boolean option to enable/disable variance

  • histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles

  • is_numeric_stats_enabled (bool) – boolean to enable/disable all numeric stats

property is_numeric_stats_enabled

Returns the state of numeric stats being enabled / disabled. If any numeric stats property is enabled it will return True, otherwise it will return False.

Returns

true if any numeric stats property is enabled, otherwise false

Rtype bool

is_prop_enabled(prop)

Checks to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

property properties

Includes at least: is_enabled: Turns on or off the column.

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.FloatOptions

Bases: dataprofiler.profilers.profiler_options.NumericalOptions

Options for the Float Column.

Variables
  • is_enabled (bool) – boolean option to enable/disable the column.

  • precision (BooleanOption) – boolean option to enable/disable precision

  • min (BooleanOption) – boolean option to enable/disable min

  • max (BooleanOption) – boolean option to enable/disable max

  • sum (BooleanOption) – boolean option to enable/disable sum

  • variance (BooleanOption) – boolean option to enable/disable variance

  • histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles

  • is_numeric_stats_enabled (bool) – boolean to enable/disable all numeric stats

property is_numeric_stats_enabled

Returns the state of numeric stats being enabled / disabled. If any numeric stats property is enabled it will return True, otherwise it will return False.

Returns

true if any numeric stats property is enabled, otherwise false

Rtype bool

is_prop_enabled(prop)

Checks to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

property properties

Includes at least: is_enabled: Turns on or off the column.

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.TextOptions

Bases: dataprofiler.profilers.profiler_options.NumericalOptions

Options for the Text Column:

Variables
  • is_enabled (bool) – boolean option to enable/disable the column.

  • vocab (BooleanOption) – boolean option to enable/disable vocab

  • min (BooleanOption) – boolean option to enable/disable min

  • max (BooleanOption) – boolean option to enable/disable max

  • sum (BooleanOption) – boolean option to enable/disable sum

  • variance (BooleanOption) – boolean option to enable/disable variance

  • histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles

  • is_numeric_stats_enabled (bool) – boolean to enable/disable all numeric stats

property is_numeric_stats_enabled

Returns the state of numeric stats being enabled / disabled. If any numeric stats property is enabled it will return True, otherwise it will return False.

Returns

true if any numeric stats property is enabled, otherwise false

Rtype bool

is_prop_enabled(prop)

Checks to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

property properties

Includes at least: is_enabled: Turns on or off the column.

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.DateTimeOptions

Bases: dataprofiler.profilers.profiler_options.BaseColumnOptions

Options for the Datetime Column

Variables

is_enabled (bool) – boolean option to enable/disable the column.

is_prop_enabled(prop)

Checks to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.OrderOptions

Bases: dataprofiler.profilers.profiler_options.BaseColumnOptions

Options for the Order Column

Variables

is_enabled (bool) – boolean option to enable/disable the column.

is_prop_enabled(prop)

Checks to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.CategoricalOptions

Bases: dataprofiler.profilers.profiler_options.BaseColumnOptions

Options for the Categorical Column

Variables

is_enabled (bool) – boolean option to enable/disable the column.

is_prop_enabled(prop)

Checks to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.DataLabelerOptions

Bases: dataprofiler.profilers.profiler_options.BaseColumnOptions

Options for the Data Labeler Column.

Variables
  • is_enabled (bool) – boolean option to enable/disable the column.

  • data_labeler_dirpath (str) – String to load data labeler from

  • max_sample_size (int) – Int to decide sample size

is_prop_enabled(prop)

Checks to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.StructuredOptions

Bases: dataprofiler.profilers.profiler_options.BaseOption

Constructs the StructuredOptions object with default values.

Variables
property enabled_columns

Returns a list of the enabled profiler columns.

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.ProfilerOptions

Bases: dataprofiler.profilers.profiler_options.BaseOption

Initializes the ProfilerOptions object.

Variables

structured_options (StructuredOptions) – option set for structured dataset profiling.

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)