Profiler Options

coding=utf-8 Specify the options when running the data profiler.

class dataprofiler.profilers.profiler_options.BaseOption

Bases: object

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.BooleanOption(is_enabled=True)

Bases: dataprofiler.profilers.profiler_options.BaseOption

Boolean option

Variables

is_enabled (bool) – boolean option to enable/disable the option.

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.HistogramOption(is_enabled=True, bin_count_or_method='auto')

Bases: dataprofiler.profilers.profiler_options.BooleanOption

Options for histograms

Variables
  • is_enabled (bool) – boolean option to enable/disable the option.

  • bin_count_or_method (Union[str, int, list(str)]) – bin count or the method with which to calculate histograms

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.ModeOption(is_enabled=True, max_k_modes=5)

Bases: dataprofiler.profilers.profiler_options.BooleanOption

Options for mode estimation

Variables
  • is_enabled (bool) – boolean option to enable/disable the option.

  • top_k_modes (int) – the max number of modes to return, if applicable

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.BaseInspectorOptions(is_enabled=True)

Bases: dataprofiler.profilers.profiler_options.BooleanOption

Base options for all the columns.

Variables

is_enabled (bool) – boolean option to enable/disable the column.

is_prop_enabled(prop)

Checks to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.NumericalOptions

Bases: dataprofiler.profilers.profiler_options.BaseInspectorOptions

Options for the Numerical Stats Mixin

Variables
  • is_enabled (bool) – boolean option to enable/disable the column.

  • min (BooleanOption) – boolean option to enable/disable min

  • max (BooleanOption) – boolean option to enable/disable max

  • mode (ModeOption) – option to enable/disable mode and set return count

  • median (BooleanOption) – option to enable/disable median

  • sum (BooleanOption) – boolean option to enable/disable sum

  • variance (BooleanOption) – boolean option to enable/disable variance

  • skewness (BooleanOption) – boolean option to enable/disable skewness

  • kurtosis (BooleanOption) – boolean option to enable/disable kurtosis

  • histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles

:ivar bias_correction : boolean option to enable/disable existence of bias :vartype bias: BooleanOption :ivar num_zeros: boolean option to enable/disable num_zeros :vartype num_zeros: BooleanOption :ivar num_negatives: boolean option to enable/disable num_negatives :vartype num_negatives: BooleanOption :ivar is_numeric_stats_enabled: boolean to enable/disable all numeric

stats

property is_numeric_stats_enabled

Returns the state of numeric stats being enabled / disabled. If any numeric stats property is enabled it will return True, otherwise it will return False.

Returns

true if any numeric stats property is enabled, otherwise false

Rtype bool

property properties

Includes at least: is_enabled: Turns on or off the column.

is_prop_enabled(prop)

Checks to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.IntOptions

Bases: dataprofiler.profilers.profiler_options.NumericalOptions

Options for the Int Column

Variables
  • is_enabled (bool) – boolean option to enable/disable the column.

  • min (BooleanOption) – boolean option to enable/disable min

  • max (BooleanOption) – boolean option to enable/disable max

  • mode (ModeOption) – option to enable/disable mode and set return count

  • median (BooleanOption) – option to enable/disable median

  • sum (BooleanOption) – boolean option to enable/disable sum

  • variance (BooleanOption) – boolean option to enable/disable variance

  • skewness (BooleanOption) – boolean option to enable/disable skewness

  • kurtosis (BooleanOption) – boolean option to enable/disable kurtosis

  • histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles

:ivar bias_correction : boolean option to enable/disable existence of bias :vartype bias: BooleanOption :ivar num_zeros: boolean option to enable/disable num_zeros :vartype num_zeros: BooleanOption :ivar num_negatives: boolean option to enable/disable num_negatives :vartype num_negatives: BooleanOption :ivar is_numeric_stats_enabled: boolean to enable/disable all numeric

stats

property is_numeric_stats_enabled

Returns the state of numeric stats being enabled / disabled. If any numeric stats property is enabled it will return True, otherwise it will return False.

Returns

true if any numeric stats property is enabled, otherwise false

Rtype bool

is_prop_enabled(prop)

Checks to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

property properties

Includes at least: is_enabled: Turns on or off the column.

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.PrecisionOptions(is_enabled=True, sample_ratio=None)

Bases: dataprofiler.profilers.profiler_options.BooleanOption

Options for precision

Variables
  • is_enabled (bool) – boolean option to enable/disable the column.

  • sample_ratio (float) – float option to determine ratio of valid float samples in determining percision. This ratio will override any defaults.

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.FloatOptions

Bases: dataprofiler.profilers.profiler_options.NumericalOptions

Options for the Float Column.

Variables
  • is_enabled (bool) – boolean option to enable/disable the column.

  • min (BooleanOption) – boolean option to enable/disable min

  • max (BooleanOption) – boolean option to enable/disable max

  • mode (ModeOption) – option to enable/disable mode and set return count

  • median (BooleanOption) – option to enable/disable median

  • sum (BooleanOption) – boolean option to enable/disable sum

  • variance (BooleanOption) – boolean option to enable/disable variance

  • skewness (BooleanOption) – boolean option to enable/disable skewness

  • kurtosis (BooleanOption) – boolean option to enable/disable kurtosis

  • histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles

:ivar bias_correction : boolean option to enable/disable existence of bias :vartype bias: BooleanOption :ivar num_zeros: boolean option to enable/disable num_zeros :vartype num_zeros: BooleanOption :ivar num_negatives: boolean option to enable/disable num_negatives :vartype num_negatives: BooleanOption :ivar is_numeric_stats_enabled: boolean to enable/disable all numeric

stats

property is_numeric_stats_enabled

Returns the state of numeric stats being enabled / disabled. If any numeric stats property is enabled it will return True, otherwise it will return False.

Returns

true if any numeric stats property is enabled, otherwise false

Rtype bool

is_prop_enabled(prop)

Checks to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

property properties

Includes at least: is_enabled: Turns on or off the column.

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.TextOptions

Bases: dataprofiler.profilers.profiler_options.NumericalOptions

Options for the Text Column:

Variables
  • is_enabled (bool) – boolean option to enable/disable the column.

  • vocab (BooleanOption) – boolean option to enable/disable vocab

  • min (BooleanOption) – boolean option to enable/disable min

  • max (BooleanOption) – boolean option to enable/disable max

  • mode (ModeOption) – option to enable/disable mode and set return count

  • median (BooleanOption) – option to enable/disable median

  • sum (BooleanOption) – boolean option to enable/disable sum

  • variance (BooleanOption) – boolean option to enable/disable variance

  • skewness (BooleanOption) – boolean option to enable/disable skewness

  • kurtosis (BooleanOption) – boolean option to enable/disable kurtosis

:ivar bias_correction : boolean option to enable/disable existence of bias :vartype bias: BooleanOption :ivar histogram_and_quantiles: boolean option to enable/disable

histogram_and_quantiles

Variables
  • num_zeros (BooleanOption) – boolean option to enable/disable num_zeros

  • num_negatives (BooleanOption) – boolean option to enable/disable num_negatives

  • is_numeric_stats_enabled (bool) – boolean to enable/disable all numeric stats

property is_numeric_stats_enabled

Returns the state of numeric stats being enabled / disabled. If any numeric stats property is enabled it will return True, otherwise it will return False. Although it seems redundant, this method is needed in order for the function below, the setter function also called is_numeric_stats_enabled, to properly work.

Returns

true if any numeric stats property is enabled, otherwise false

Rtype bool

is_prop_enabled(prop)

Checks to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

property properties

Includes at least: is_enabled: Turns on or off the column.

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.DateTimeOptions

Bases: dataprofiler.profilers.profiler_options.BaseInspectorOptions

Options for the Datetime Column

Variables

is_enabled (bool) – boolean option to enable/disable the column.

is_prop_enabled(prop)

Checks to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.OrderOptions

Bases: dataprofiler.profilers.profiler_options.BaseInspectorOptions

Options for the Order Column

Variables

is_enabled (bool) – boolean option to enable/disable the column.

is_prop_enabled(prop)

Checks to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.CategoricalOptions(is_enabled=True, top_k_categories=None)

Bases: dataprofiler.profilers.profiler_options.BaseInspectorOptions

Options for the Categorical Column

Variables
  • is_enabled (bool) – boolean option to enable/disable the column.

  • top_k_categories ([None, int]) – number of categories to be displayed when called

is_prop_enabled(prop)

Checks to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.CorrelationOptions(is_enabled=False, columns=None)

Bases: dataprofiler.profilers.profiler_options.BaseInspectorOptions

Options for the Correlation between Columns

Variables
  • is_enabled (bool) – boolean option to enable/disable.

  • columns (list()) – Columns considered to calculate correlation

is_prop_enabled(prop)

Checks to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.DataLabelerOptions

Bases: dataprofiler.profilers.profiler_options.BaseInspectorOptions

Options for the Data Labeler Column.

Variables
  • is_enabled (bool) – boolean option to enable/disable the column.

  • data_labeler_dirpath (str) – String to load data labeler from

  • max_sample_size (BaseDataLabeler) – Int to decide sample size

  • data_labeler_object – DataLabeler object used in profiler

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

is_prop_enabled(prop)

Checks to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.TextProfilerOptions(is_enabled=True, is_case_sensitive=True, stop_words=None, top_k_chars=None, top_k_words=None)

Bases: dataprofiler.profilers.profiler_options.BaseInspectorOptions

Constructs the TextProfilerOption object with default values.

Variables
  • is_enabled (bool) – boolean option to enable/disable the option.

  • is_case_sensitive (bool) – option set for case sensitivity.

  • stop_words (Union[None, list(str)]) – option set for stop words.

  • top_k_chars (Union[None, int]) – option set for number of top common characters.

  • top_k_words (Union[None, int]) – option set for number of top common words.

  • words (BooleanOption) – option set for word update.

  • vocab (BooleanOption) – option set for vocab update.

is_prop_enabled(prop)

Checks to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.StructuredOptions(null_values=None)

Bases: dataprofiler.profilers.profiler_options.BaseOption

Constructs the StructuredOptions object with default values.

Parameters

null_values – null values we input.

Variables
  • int (IntOptions) – option set for int profiling.

  • float (FloatOptions) – option set for float profiling.

  • datetime (DateTimeOptions) – option set for datetime profiling.

  • text (TextOptions) – option set for text profiling.

  • order (OrderOptions) – option set for order profiling.

  • category (CategoricalOptions) – option set for category profiling.

  • data_labeler (DataLabelerOptions) – option set for data_labeler profiling.

  • correlation (CorrelationOptions) – option set for correlation profiling.

  • chi2_homogeneity (BooleanOption()) – option set for chi2_homogeneity matrix

  • null_values (Union[None, dict]) – option set for defined null values

property enabled_profiles

Returns a list of the enabled profilers for columns.

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.UnstructuredOptions

Bases: dataprofiler.profilers.profiler_options.BaseOption

Constructs the UnstructuredOptions object with default values.

Variables
property enabled_profiles

Returns a list of the enabled profilers.

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options)

Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.ProfilerOptions

Bases: dataprofiler.profilers.profiler_options.BaseOption

Initializes the ProfilerOptions object.

Variables
  • structured_options (StructuredOptions) – option set for structured dataset profiling.

  • unstructured_options (UnstructuredOptions) – option set for unstructured dataset profiling.

property properties

Returns a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

validate(raise_error=True)

Validates the options do not conflict and cause errors. Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

set(options)

Overwrites BaseOption.set since the type (unstructured/structured) may need to be specified if the same options exist within both self.structured_options and self.unstructured_options

Parameters

options (dict) – Dictionary of options to set

Return

None