Profiler Options

Specify the options when running the data profiler.

class dataprofiler.profilers.profiler_options.BaseOption

Bases: Generic[dataprofiler.profilers.profiler_options.BaseOptionT]

For configuring options.

property properties: dict

Return a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

class dataprofiler.profilers.profiler_options.BooleanOption(is_enabled: bool = True)

Bases: dataprofiler.profilers.profiler_options.BaseOption[dataprofiler.profilers.profiler_options.BooleanOptionT]

For setting Boolean options.

Initialize Boolean option.

Variables

is_enabled (bool) – boolean option to enable/disable the option.

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

property properties: dict

Return a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.HistogramAndQuantilesOption(is_enabled: bool = True, bin_count_or_method: str | int | list[str] = 'auto', num_quantiles: int = 1000)

Bases: dataprofiler.profilers.profiler_options.BooleanOption[HistogramAndQuantilesOption]

For setting histogram options.

Initialize Options for histograms.

Variables
  • is_enabled (bool) – boolean option to enable/disable the option.

  • bin_count_or_method (Union[str, int, list(str)]) – bin count or the method with which to calculate histograms

  • num_quantiles (int) – number of quantiles

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

property properties: dict

Return a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.ModeOption(is_enabled: bool = True, max_k_modes: int = 5)

Bases: dataprofiler.profilers.profiler_options.BooleanOption[ModeOption]

For setting mode estimation options.

Initialize Options for mode estimation.

Variables
  • is_enabled (bool) – boolean option to enable/disable the option.

  • max_k_modes (int) – the max number of modes to return, if applicable

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

property properties: dict

Return a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.BaseInspectorOptions(is_enabled: bool = True)

Bases: dataprofiler.profilers.profiler_options.BooleanOption[dataprofiler.profilers.profiler_options.BaseInspectorOptionsT]

For setting Base options.

Initialize Base options for all the columns.

Variables

is_enabled (bool) – boolean option to enable/disable the column.

is_prop_enabled(prop: str) bool

Check to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

property properties: dict

Return a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.NumericalOptions

Bases: dataprofiler.profilers.profiler_options.BaseInspectorOptions[dataprofiler.profilers.profiler_options.NumericalOptionsT]

For configuring options for Numerican Stats Mixin.

Initialize Options for the Numerical Stats Mixin.

Variables
  • is_enabled (bool) – boolean option to enable/disable the column.

  • min (BooleanOption) – boolean option to enable/disable min

  • max (BooleanOption) – boolean option to enable/disable max

  • mode (ModeOption) – option to enable/disable mode and set return count

  • median (BooleanOption) – option to enable/disable median

  • sum (BooleanOption) – boolean option to enable/disable sum

  • variance (BooleanOption) – boolean option to enable/disable variance

  • skewness (BooleanOption) – boolean option to enable/disable skewness

  • kurtosis (BooleanOption) – boolean option to enable/disable kurtosis

  • histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles

:ivar bias_correction : boolean option to enable/disable existence of bias :vartype bias_correction: BooleanOption :ivar num_zeros: boolean option to enable/disable num_zeros :vartype num_zeros: BooleanOption :ivar num_negatives: boolean option to enable/disable num_negatives :vartype num_negatives: BooleanOption :ivar is_numeric_stats_enabled: boolean to enable/disable all numeric

stats

property is_numeric_stats_enabled: bool

Return the state of numeric stats being enabled / disabled.

If any numeric stats property is enabled it will return True, otherwise it will return False.

Returns

true if any numeric stats property is enabled, otherwise false

Rtype bool

property properties: dict

Include is_enabled.

is_enabled: Turns on or off the column.

is_prop_enabled(prop: str) bool

Check to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.IntOptions

Bases: dataprofiler.profilers.profiler_options.NumericalOptions[IntOptions]

For configuring options for Int Column.

Initialize Options for the Int Column.

Variables
  • is_enabled (bool) – boolean option to enable/disable the column.

  • min (BooleanOption) – boolean option to enable/disable min

  • max (BooleanOption) – boolean option to enable/disable max

  • mode (ModeOption) – option to enable/disable mode and set return count

  • median (BooleanOption) – option to enable/disable median

  • sum (BooleanOption) – boolean option to enable/disable sum

  • variance (BooleanOption) – boolean option to enable/disable variance

  • skewness (BooleanOption) – boolean option to enable/disable skewness

  • kurtosis (BooleanOption) – boolean option to enable/disable kurtosis

  • histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles

:ivar bias_correction : boolean option to enable/disable existence of bias :vartype bias_correction: BooleanOption :ivar num_zeros: boolean option to enable/disable num_zeros :vartype num_zeros: BooleanOption :ivar num_negatives: boolean option to enable/disable num_negatives :vartype num_negatives: BooleanOption :ivar is_numeric_stats_enabled: boolean to enable/disable all numeric

stats

property is_numeric_stats_enabled: bool

Return the state of numeric stats being enabled / disabled.

If any numeric stats property is enabled it will return True, otherwise it will return False.

Returns

true if any numeric stats property is enabled, otherwise false

Rtype bool

is_prop_enabled(prop: str) bool

Check to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

property properties: dict

Include is_enabled.

is_enabled: Turns on or off the column.

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.PrecisionOptions(is_enabled: bool = True, sample_ratio: Optional[float] = None)

Bases: dataprofiler.profilers.profiler_options.BooleanOption[PrecisionOptions]

For configuring options for precision.

Initialize Options for precision.

Variables
  • is_enabled (bool) – boolean option to enable/disable the column.

  • sample_ratio (float) – float option to determine ratio of valid float samples in determining percision. This ratio will override any defaults.

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

property properties: dict

Return a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.FloatOptions

Bases: dataprofiler.profilers.profiler_options.NumericalOptions[FloatOptions]

For configuring options for Float Column.

Initialize Options for the Float Column.

Variables
  • is_enabled (bool) – boolean option to enable/disable the column.

  • min (BooleanOption) – boolean option to enable/disable min

  • max (BooleanOption) – boolean option to enable/disable max

  • mode (ModeOption) – option to enable/disable mode and set return count

  • median (BooleanOption) – option to enable/disable median

  • sum (BooleanOption) – boolean option to enable/disable sum

  • variance (BooleanOption) – boolean option to enable/disable variance

  • skewness (BooleanOption) – boolean option to enable/disable skewness

  • kurtosis (BooleanOption) – boolean option to enable/disable kurtosis

  • histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles

:ivar bias_correction : boolean option to enable/disable existence of bias :vartype bias_correction: BooleanOption :ivar num_zeros: boolean option to enable/disable num_zeros :vartype num_zeros: BooleanOption :ivar num_negatives: boolean option to enable/disable num_negatives :vartype num_negatives: BooleanOption :ivar is_numeric_stats_enabled: boolean to enable/disable all numeric

stats

property is_numeric_stats_enabled: bool

Return the state of numeric stats being enabled / disabled.

If any numeric stats property is enabled it will return True, otherwise it will return False.

Returns

true if any numeric stats property is enabled, otherwise false

Rtype bool

is_prop_enabled(prop: str) bool

Check to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

property properties: dict

Include is_enabled.

is_enabled: Turns on or off the column.

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.TextOptions

Bases: dataprofiler.profilers.profiler_options.NumericalOptions[TextOptions]

For configuring options for Text Column.

Initialize Options for the Text Column.

Variables
  • is_enabled (bool) – boolean option to enable/disable the column.

  • vocab (BooleanOption) – boolean option to enable/disable vocab

  • min (BooleanOption) – boolean option to enable/disable min

  • max (BooleanOption) – boolean option to enable/disable max

  • mode (ModeOption) – option to enable/disable mode and set return count

  • median (BooleanOption) – option to enable/disable median

  • sum (BooleanOption) – boolean option to enable/disable sum

  • variance (BooleanOption) – boolean option to enable/disable variance

  • skewness (BooleanOption) – boolean option to enable/disable skewness

  • kurtosis (BooleanOption) – boolean option to enable/disable kurtosis

:ivar bias_correction : boolean option to enable/disable existence of bias :vartype bias_correction: BooleanOption :ivar histogram_and_quantiles: boolean option to enable/disable

histogram_and_quantiles

Variables
  • num_zeros (BooleanOption) – boolean option to enable/disable num_zeros

  • num_negatives (BooleanOption) – boolean option to enable/disable num_negatives

  • is_numeric_stats_enabled (bool) – boolean to enable/disable all numeric stats

property is_numeric_stats_enabled: bool

Return the state of numeric stats being enabled / disabled.

If any numeric stats property is enabled it will return True, otherwise it will return False. Although it seems redundant, this method is needed in order for the function below, the setter function also called is_numeric_stats_enabled, to properly work.

Returns

true if any numeric stats property is enabled, otherwise false

Rtype bool

is_prop_enabled(prop: str) bool

Check to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

property properties: dict

Include is_enabled.

is_enabled: Turns on or off the column.

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.DateTimeOptions

Bases: dataprofiler.profilers.profiler_options.BaseInspectorOptions[DateTimeOptions]

For configuring options for Datetime Column.

Initialize Options for the Datetime Column.

Variables

is_enabled (bool) – boolean option to enable/disable the column.

is_prop_enabled(prop: str) bool

Check to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

property properties: dict

Return a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.OrderOptions

Bases: dataprofiler.profilers.profiler_options.BaseInspectorOptions[OrderOptions]

For configuring options for Order Column.

Initialize options for the Order Column.

Variables

is_enabled (bool) – boolean option to enable/disable the column.

is_prop_enabled(prop: str) bool

Check to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

property properties: dict

Return a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.CategoricalOptions(is_enabled: bool = True, top_k_categories: int | None = None, max_sample_size_to_check_stop_condition: int | None = None, stop_condition_unique_value_ratio: float | None = None, cms: bool = False, cms_confidence: float | None = 0.95, cms_relative_error: float | None = 0.01, cms_max_num_heavy_hitters: int | None = 5000)

Bases: dataprofiler.profilers.profiler_options.BaseInspectorOptions[CategoricalOptions]

For configuring options Categorical Column.

Initialize options for the Categorical Column.

Variables
  • is_enabled (bool) – boolean option to enable/disable the column.

  • top_k_categories ([None, int]) – number of categories to be displayed when called

  • max_sample_size_to_check_stop_condition ([None, int]) – The maximum sample size before categorical stop conditions are checked

  • stop_condition_unique_value_ratio ([None, float]) – The highest ratio of unique values to dataset size that is to be considered a categorical type

  • cms (bool) – boolean option for using count min sketch

  • cms_confidence ([None, float]) – defines the number of hashes used in CMS. eg. confidence = 1 - failure probability, default 0.95

  • cms_relative_error ([None, float]) – defines the number of buckets used in CMS, default 0.01

  • cms_max_num_heavy_hitters – value used to define

the threshold for minimum frequency required by a category to be counted :vartype cms_max_num_heavy_hitters: [None, int]

is_prop_enabled(prop: str) bool

Check to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

property properties: dict

Return a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.CorrelationOptions(is_enabled: bool = False, columns: Optional[list] = None)

Bases: dataprofiler.profilers.profiler_options.BaseInspectorOptions[CorrelationOptions]

For configuring options for Correlation between Columns.

Initialize options for the Correlation between Columns.

Variables
  • is_enabled (bool) – boolean option to enable/disable.

  • columns (list()) – Columns considered to calculate correlation

is_prop_enabled(prop: str) bool

Check to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

property properties: dict

Return a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.HyperLogLogOptions(seed: int = 0, register_count: int = 15)

Bases: dataprofiler.profilers.profiler_options.BaseOption[HyperLogLogOptions]

Options for alternative method of gathering unique row count.

Initialize options for the hyperloglog method of gathering unique row count.

Variables
  • is_enabled (bool) – boolean option to enable/disable.

  • seed (int) – seed used to set HLL hashing function

  • register_count (int) – number of registers is equal to 2^register_count

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

property properties: dict

Return a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.UniqueCountOptions(is_enabled: bool = True, hashing_method: str = 'full')

Bases: dataprofiler.profilers.profiler_options.BooleanOption[UniqueCountOptions]

For configuring options for unique row count.

Initialize options for unique row counts.

Variables
  • is_enabled (bool) – boolean option to enable/disable.

  • hashing_method (str) – property to specify row hashing method (“full” | “hll”)

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

property properties: dict

Return a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.RowStatisticsOptions(is_enabled: bool = True, unique_count: bool = True, null_count: bool = True)

Bases: dataprofiler.profilers.profiler_options.BooleanOption[RowStatisticsOptions]

For configuring options for row statistics.

Initialize options for row statistics.

Variables
  • is_enabled (bool) – boolean option to enable/disable.

  • unique_count (bool) – boolean option to enable/disable unique_count

ivar null_count: boolean option to enable/disable null_count :vartype null_count: bool

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

property properties: dict

Return a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.DataLabelerOptions

Bases: dataprofiler.profilers.profiler_options.BaseInspectorOptions[DataLabelerOptions]

For configuring options for Data Labeler Column.

Initialize options for the Data Labeler Column.

Variables
  • is_enabled (bool) – boolean option to enable/disable the column.

  • data_labeler_dirpath (str) – String to load data labeler

  • max_sample_size (BaseDataLabeler) – Int to decide sample size

  • data_labeler_object – DataLabeler object used in profiler

property properties: dict

Return a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

classmethod load_from_dict(data, config: dict | None = None) DataLabelerOptions

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Profiler with attributes populated.

Return type

DataLabelerOptions

is_prop_enabled(prop: str) bool

Check to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.TextProfilerOptions(is_enabled: bool = True, is_case_sensitive: bool = True, stop_words: Optional[set] = None, top_k_chars: Optional[int] = None, top_k_words: Optional[int] = None)

Bases: dataprofiler.profilers.profiler_options.BaseInspectorOptions[TextProfilerOptions]

For configuring options for text profiler.

Construct the TextProfilerOption object with default values.

Variables
  • is_enabled (bool) – boolean option to enable/disable the option.

  • is_case_sensitive (bool) – option set for case sensitivity.

  • stop_words (Union[None, list(str)]) – option set for stop words.

  • top_k_chars (Union[None, int]) – option set for number of top common characters.

  • top_k_words (Union[None, int]) – option set for number of top common words.

  • words (BooleanOption) – option set for word update.

  • vocab (BooleanOption) – option set for vocab update.

is_prop_enabled(prop: str) bool

Check to see if a property is enabled or not and returns boolean.

Parameters

prop (String) – The option to check if it is enabled

Returns

Whether or not the property is enabled

Return type

Boolean

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

property properties: dict

Return a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.StructuredOptions(null_values: dict[str, re.RegexFlag | int] = None, column_null_values: dict[int, dict[str, re.RegexFlag | int]] = None, sampling_ratio: float = 0.2)

Bases: dataprofiler.profilers.profiler_options.BaseOption[StructuredOptions]

For configuring options for structured profiler.

Construct the StructuredOptions object with default values.

Parameters
  • null_values – null values we input.

  • column_null_values – column level null values we input.

Variables
  • int (IntOptions) – option set for int profiling.

  • float (FloatOptions) – option set for float profiling.

  • datetime (DateTimeOptions) – option set for datetime profiling.

  • text (TextOptions) – option set for text profiling.

  • order (OrderOptions) – option set for order profiling.

  • category (CategoricalOptions) – option set for category profiling.

  • data_labeler (DataLabelerOptions) – option set for data_labeler profiling.

  • correlation (CorrelationOptions) – option set for correlation profiling.

  • chi2_homogeneity (BooleanOption()) – option set for chi2_homogeneity matrix

  • row_statistics (BooleanOption()) – option set for row statistics calculations

  • null_replication_metrics (BooleanOptions) – option set for metrics calculation for replicating nan vals

  • null_values (Union[None, dict]) – option set for defined null values

  • sampling_ratio (Union[None, float]) – What ratio of the input data to sample. Float value > 0 and <= 1

property enabled_profiles: list

Return a list of the enabled profilers for columns.

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

property properties: dict

Return a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.UnstructuredOptions

Bases: dataprofiler.profilers.profiler_options.BaseOption[UnstructuredOptions]

For configuring options for unstructured profiler.

Construct the UnstructuredOptions object with default values.

Variables
property enabled_profiles: list

Return a list of the enabled profilers.

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

property properties: dict

Return a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

set(options: dict) None

Set all the options.

Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.

Parameters

options (dict) – dict containing the options you want to set.

Returns

None

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

class dataprofiler.profilers.profiler_options.ProfilerOptions(presets: Optional[str] = None)

Bases: dataprofiler.profilers.profiler_options.BaseOption[ProfilerOptions]

For configuring options for profiler.

Initialize the ProfilerOptions object.

Variables
  • structured_options (StructuredOptions) – option set for structured dataset profiling.

  • unstructured_options (UnstructuredOptions) – option set for unstructured dataset profiling.

  • presets (Optional[str]) – A pre-configured mapping of a string name to group of options: “complete”, “data_types”, “numeric_stats_disabled”, and “lower_memory_sketching”. Default: None

classmethod load_from_dict(data, config: dict | None = None) BaseOption

Parse attribute from json dictionary into self.

Parameters
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config to override loading options params from dictionary

Returns

Options with attributes populated.

Return type

BaseOption

property properties: dict

Return a copy of the option properties.

Returns

dictionary of the option’s properties attr: value

Return type

dict

validate(raise_error: bool = True) list[str] | None

Validate the options do not conflict and cause errors.

Raises error/warning if so.

Parameters

raise_error (bool) – Flag that raises errors if true. Returns errors if false.

Returns

list of errors (if raise_error is false)

Return type

list(str)

set(options: dict) None

Overwrite BaseOption.set.

We do this because the type (unstructured/structured) may need to be specified if the same options exist within both self.structured_options and self.unstructured_options

Parameters

options (dict) – Dictionary of options to set

Return

None