Profiler Options¶
Specify the options when running the data profiler.
- class dataprofiler.profilers.profiler_options.BaseOption(*args, **kwds)¶
Bases:
Generic
[dataprofiler.profilers.profiler_options.BaseOptionT
]For configuring options.
- property properties: dict[str, BooleanOption]¶
Return a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- class dataprofiler.profilers.profiler_options.BooleanOption(is_enabled: bool = True)¶
Bases:
dataprofiler.profilers.profiler_options.BaseOption
[dataprofiler.profilers.profiler_options.BooleanOptionT
]For setting Boolean options.
Initialize Boolean option.
- Variables
is_enabled (bool) – boolean option to enable/disable the option.
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- property properties: dict[str, BooleanOption]¶
Return a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.HistogramAndQuantilesOption(is_enabled: bool = True, bin_count_or_method: str | int | list[str] = 'auto', num_quantiles: int = 1000)¶
Bases:
dataprofiler.profilers.profiler_options.BooleanOption
[HistogramAndQuantilesOption
]For setting histogram options.
Initialize Options for histograms.
- Variables
is_enabled (bool) – boolean option to enable/disable the option.
bin_count_or_method (Union[str, int, list(str)]) – bin count or the method with which to calculate histograms
num_quantiles (int) – number of quantiles
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- property properties: dict[str, BooleanOption]¶
Return a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.ModeOption(is_enabled: bool = True, max_k_modes: int = 5)¶
Bases:
dataprofiler.profilers.profiler_options.BooleanOption
[ModeOption
]For setting mode estimation options.
Initialize Options for mode estimation.
- Variables
is_enabled (bool) – boolean option to enable/disable the option.
max_k_modes (int) – the max number of modes to return, if applicable
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- property properties: dict[str, BooleanOption]¶
Return a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.BaseInspectorOptions(is_enabled: bool = True)¶
Bases:
dataprofiler.profilers.profiler_options.BooleanOption
[dataprofiler.profilers.profiler_options.BaseInspectorOptionsT
]For setting Base options.
Initialize Base options for all the columns.
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
- is_prop_enabled(prop: str) bool ¶
Check to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- property properties: dict[str, BooleanOption]¶
Return a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.NumericalOptions¶
Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptions
[dataprofiler.profilers.profiler_options.NumericalOptionsT
]For configuring options for Numerican Stats Mixin.
Initialize Options for the Numerical Stats Mixin.
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
min (BooleanOption) – boolean option to enable/disable min
max (BooleanOption) – boolean option to enable/disable max
mode (ModeOption) – option to enable/disable mode and set return count
median (BooleanOption) – option to enable/disable median
sum (BooleanOption) – boolean option to enable/disable sum
variance (BooleanOption) – boolean option to enable/disable variance
skewness (BooleanOption) – boolean option to enable/disable skewness
kurtosis (BooleanOption) – boolean option to enable/disable kurtosis
histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles
:ivar bias_correction : boolean option to enable/disable existence of bias :vartype bias_correction: BooleanOption :ivar num_zeros: boolean option to enable/disable num_zeros :vartype num_zeros: BooleanOption :ivar num_negatives: boolean option to enable/disable num_negatives :vartype num_negatives: BooleanOption :ivar is_numeric_stats_enabled: boolean to enable/disable all numeric
stats
- property is_numeric_stats_enabled: bool¶
Return the state of numeric stats being enabled / disabled.
If any numeric stats property is enabled it will return True, otherwise it will return False.
- Returns
true if any numeric stats property is enabled, otherwise false
- Rtype bool
- property properties: dict[str, BooleanOption]¶
Include is_enabled.
is_enabled: Turns on or off the column.
- is_prop_enabled(prop: str) bool ¶
Check to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.IntOptions¶
Bases:
dataprofiler.profilers.profiler_options.NumericalOptions
[IntOptions
]For configuring options for Int Column.
Initialize Options for the Int Column.
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
min (BooleanOption) – boolean option to enable/disable min
max (BooleanOption) – boolean option to enable/disable max
mode (ModeOption) – option to enable/disable mode and set return count
median (BooleanOption) – option to enable/disable median
sum (BooleanOption) – boolean option to enable/disable sum
variance (BooleanOption) – boolean option to enable/disable variance
skewness (BooleanOption) – boolean option to enable/disable skewness
kurtosis (BooleanOption) – boolean option to enable/disable kurtosis
histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles
:ivar bias_correction : boolean option to enable/disable existence of bias :vartype bias_correction: BooleanOption :ivar num_zeros: boolean option to enable/disable num_zeros :vartype num_zeros: BooleanOption :ivar num_negatives: boolean option to enable/disable num_negatives :vartype num_negatives: BooleanOption :ivar is_numeric_stats_enabled: boolean to enable/disable all numeric
stats
- property is_numeric_stats_enabled: bool¶
Return the state of numeric stats being enabled / disabled.
If any numeric stats property is enabled it will return True, otherwise it will return False.
- Returns
true if any numeric stats property is enabled, otherwise false
- Rtype bool
- is_prop_enabled(prop: str) bool ¶
Check to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- property properties: dict[str, BooleanOption]¶
Include is_enabled.
is_enabled: Turns on or off the column.
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.PrecisionOptions(is_enabled: bool = True, sample_ratio: Optional[float] = None)¶
Bases:
dataprofiler.profilers.profiler_options.BooleanOption
[PrecisionOptions
]For configuring options for precision.
Initialize Options for precision.
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
sample_ratio (float) – float option to determine ratio of valid float samples in determining percision. This ratio will override any defaults.
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- property properties: dict[str, BooleanOption]¶
Return a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.FloatOptions¶
Bases:
dataprofiler.profilers.profiler_options.NumericalOptions
[FloatOptions
]For configuring options for Float Column.
Initialize Options for the Float Column.
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
min (BooleanOption) – boolean option to enable/disable min
max (BooleanOption) – boolean option to enable/disable max
mode (ModeOption) – option to enable/disable mode and set return count
median (BooleanOption) – option to enable/disable median
sum (BooleanOption) – boolean option to enable/disable sum
variance (BooleanOption) – boolean option to enable/disable variance
skewness (BooleanOption) – boolean option to enable/disable skewness
kurtosis (BooleanOption) – boolean option to enable/disable kurtosis
histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles
:ivar bias_correction : boolean option to enable/disable existence of bias :vartype bias_correction: BooleanOption :ivar num_zeros: boolean option to enable/disable num_zeros :vartype num_zeros: BooleanOption :ivar num_negatives: boolean option to enable/disable num_negatives :vartype num_negatives: BooleanOption :ivar is_numeric_stats_enabled: boolean to enable/disable all numeric
stats
- property is_numeric_stats_enabled: bool¶
Return the state of numeric stats being enabled / disabled.
If any numeric stats property is enabled it will return True, otherwise it will return False.
- Returns
true if any numeric stats property is enabled, otherwise false
- Rtype bool
- is_prop_enabled(prop: str) bool ¶
Check to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- property properties: dict[str, BooleanOption]¶
Include is_enabled.
is_enabled: Turns on or off the column.
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.TextOptions¶
Bases:
dataprofiler.profilers.profiler_options.NumericalOptions
[TextOptions
]For configuring options for Text Column.
Initialize Options for the Text Column.
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
vocab (BooleanOption) – boolean option to enable/disable vocab
min (BooleanOption) – boolean option to enable/disable min
max (BooleanOption) – boolean option to enable/disable max
mode (ModeOption) – option to enable/disable mode and set return count
median (BooleanOption) – option to enable/disable median
sum (BooleanOption) – boolean option to enable/disable sum
variance (BooleanOption) – boolean option to enable/disable variance
skewness (BooleanOption) – boolean option to enable/disable skewness
kurtosis (BooleanOption) – boolean option to enable/disable kurtosis
:ivar bias_correction : boolean option to enable/disable existence of bias :vartype bias_correction: BooleanOption :ivar histogram_and_quantiles: boolean option to enable/disable
histogram_and_quantiles
- Variables
num_zeros (BooleanOption) – boolean option to enable/disable num_zeros
num_negatives (BooleanOption) – boolean option to enable/disable num_negatives
is_numeric_stats_enabled (bool) – boolean to enable/disable all numeric stats
- property is_numeric_stats_enabled: bool¶
Return the state of numeric stats being enabled / disabled.
If any numeric stats property is enabled it will return True, otherwise it will return False. Although it seems redundant, this method is needed in order for the function below, the setter function also called is_numeric_stats_enabled, to properly work.
- Returns
true if any numeric stats property is enabled, otherwise false
- Rtype bool
- is_prop_enabled(prop: str) bool ¶
Check to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- property properties: dict[str, BooleanOption]¶
Include is_enabled.
is_enabled: Turns on or off the column.
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.DateTimeOptions¶
Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptions
[DateTimeOptions
]For configuring options for Datetime Column.
Initialize Options for the Datetime Column.
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
- is_prop_enabled(prop: str) bool ¶
Check to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- property properties: dict[str, BooleanOption]¶
Return a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.OrderOptions¶
Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptions
[OrderOptions
]For configuring options for Order Column.
Initialize options for the Order Column.
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
- is_prop_enabled(prop: str) bool ¶
Check to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- property properties: dict[str, BooleanOption]¶
Return a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.CategoricalOptions(is_enabled: bool = True, top_k_categories: int | None = None, max_sample_size_to_check_stop_condition: int | None = None, stop_condition_unique_value_ratio: float | None = None, cms: bool = False, cms_confidence: float | None = 0.95, cms_relative_error: float | None = 0.01, cms_max_num_heavy_hitters: int | None = 5000)¶
Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptions
[CategoricalOptions
]For configuring options Categorical Column.
Initialize options for the Categorical Column.
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
top_k_categories ([None, int]) – number of categories to be displayed when called
max_sample_size_to_check_stop_condition ([None, int]) – The maximum sample size before categorical stop conditions are checked
stop_condition_unique_value_ratio ([None, float]) – The highest ratio of unique values to dataset size that is to be considered a categorical type
cms (bool) – boolean option for using count min sketch
cms_confidence ([None, float]) – defines the number of hashes used in CMS. eg. confidence = 1 - failure probability, default 0.95
cms_relative_error ([None, float]) – defines the number of buckets used in CMS, default 0.01
cms_max_num_heavy_hitters – value used to define
the threshold for minimum frequency required by a category to be counted :vartype cms_max_num_heavy_hitters: [None, int]
- is_prop_enabled(prop: str) bool ¶
Check to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- property properties: dict[str, BooleanOption]¶
Return a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.CorrelationOptions(is_enabled: bool = False, columns: list[str] = None)¶
Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptions
[CorrelationOptions
]For configuring options for Correlation between Columns.
Initialize options for the Correlation between Columns.
- Variables
is_enabled (bool) – boolean option to enable/disable.
columns (list()) – Columns considered to calculate correlation
- is_prop_enabled(prop: str) bool ¶
Check to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- property properties: dict[str, BooleanOption]¶
Return a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.HyperLogLogOptions(seed: int = 0, register_count: int = 15)¶
Bases:
dataprofiler.profilers.profiler_options.BaseOption
[HyperLogLogOptions
]Options for alternative method of gathering unique row count.
Initialize options for the hyperloglog method of gathering unique row count.
- Variables
is_enabled (bool) – boolean option to enable/disable.
seed (int) – seed used to set HLL hashing function
register_count (int) – number of registers is equal to 2^register_count
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- property properties: dict[str, BooleanOption]¶
Return a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.UniqueCountOptions(is_enabled: bool = True, hashing_method: str = 'full')¶
Bases:
dataprofiler.profilers.profiler_options.BooleanOption
[UniqueCountOptions
]For configuring options for unique row count.
Initialize options for unique row counts.
- Variables
is_enabled (bool) – boolean option to enable/disable.
hashing_method (str) – property to specify row hashing method (“full” | “hll”)
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- property properties: dict[str, BooleanOption]¶
Return a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.RowStatisticsOptions(is_enabled: bool = True, unique_count: bool = True, null_count: bool = True)¶
Bases:
dataprofiler.profilers.profiler_options.BooleanOption
[RowStatisticsOptions
]For configuring options for row statistics.
Initialize options for row statistics.
- Variables
is_enabled (bool) – boolean option to enable/disable.
unique_count (bool) – boolean option to enable/disable unique_count
ivar null_count: boolean option to enable/disable null_count :vartype null_count: bool
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- property properties: dict[str, BooleanOption]¶
Return a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.DataLabelerOptions¶
Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptions
[DataLabelerOptions
]For configuring options for Data Labeler Column.
Initialize options for the Data Labeler Column.
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
data_labeler_dirpath (str) – String to load data labeler
max_sample_size (BaseDataLabeler) – Int to decide sample size
data_labeler_object – DataLabeler object used in profiler
- property properties: dict¶
Return a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- classmethod load_from_dict(data, config: dict | None = None) DataLabelerOptions ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Profiler with attributes populated.
- Return type
- is_prop_enabled(prop: str) bool ¶
Check to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.TextProfilerOptions(is_enabled: bool = True, is_case_sensitive: bool = True, stop_words: set[str] = None, top_k_chars: int = None, top_k_words: int = None)¶
Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptions
[TextProfilerOptions
]For configuring options for text profiler.
Construct the TextProfilerOption object with default values.
- Variables
is_enabled (bool) – boolean option to enable/disable the option.
is_case_sensitive (bool) – option set for case sensitivity.
stop_words (Union[None, list(str)]) – option set for stop words.
top_k_chars (Union[None, int]) – option set for number of top common characters.
top_k_words (Union[None, int]) – option set for number of top common words.
words (BooleanOption) – option set for word update.
vocab (BooleanOption) – option set for vocab update.
- is_prop_enabled(prop: str) bool ¶
Check to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- property properties: dict[str, BooleanOption]¶
Return a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.StructuredOptions(null_values: dict[str, re.RegexFlag | int] = None, column_null_values: dict[int, dict[str, re.RegexFlag | int]] = None, sampling_ratio: float = 0.2)¶
Bases:
dataprofiler.profilers.profiler_options.BaseOption
[StructuredOptions
]For configuring options for structured profiler.
Construct the StructuredOptions object with default values.
- Parameters
null_values – null values we input.
column_null_values – column level null values we input.
- Variables
int (IntOptions) – option set for int profiling.
float (FloatOptions) – option set for float profiling.
datetime (DateTimeOptions) – option set for datetime profiling.
text (TextOptions) – option set for text profiling.
order (OrderOptions) – option set for order profiling.
category (CategoricalOptions) – option set for category profiling.
data_labeler (DataLabelerOptions) – option set for data_labeler profiling.
correlation (CorrelationOptions) – option set for correlation profiling.
chi2_homogeneity (BooleanOption()) – option set for chi2_homogeneity matrix
row_statistics (BooleanOption()) – option set for row statistics calculations
null_replication_metrics (BooleanOptions) – option set for metrics calculation for replicating nan vals
null_values (Union[None, dict]) – option set for defined null values
sampling_ratio (Union[None, float]) – What ratio of the input data to sample. Float value > 0 and <= 1
- property enabled_profiles: list[str]¶
Return a list of the enabled profilers for columns.
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- property properties: dict[str, BooleanOption]¶
Return a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.UnstructuredOptions¶
Bases:
dataprofiler.profilers.profiler_options.BaseOption
[UnstructuredOptions
]For configuring options for unstructured profiler.
Construct the UnstructuredOptions object with default values.
- Variables
text (TextProfilerOptions) – option set for text profiling.
data_labeler (DataLabelerOptions) – option set for data_labeler profiling.
- property enabled_profiles: list[str]¶
Return a list of the enabled profilers.
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- property properties: dict[str, BooleanOption]¶
Return a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options: dict[str, bool]) None ¶
Set all the options.
Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.ProfilerOptions(presets: Optional[str] = None)¶
Bases:
dataprofiler.profilers.profiler_options.BaseOption
[ProfilerOptions
]For configuring options for profiler.
Initialize the ProfilerOptions object.
- Variables
structured_options (StructuredOptions) – option set for structured dataset profiling.
unstructured_options (UnstructuredOptions) – option set for unstructured dataset profiling.
presets (Optional[str]) – A pre-configured mapping of a string name to group of options: “complete”, “data_types”, “numeric_stats_disabled”, and “lower_memory_sketching”. Default: None
- classmethod load_from_dict(data, config: dict | None = None) BaseOption ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config to override loading options params from dictionary
- Returns
Options with attributes populated.
- Return type
- property properties: dict[str, BooleanOption]¶
Return a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- validate(raise_error: bool = True) list[str] | None ¶
Validate the options do not conflict and cause errors.
Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- set(options: dict[str, Any]) None ¶
Overwrite BaseOption.set.
We do this because the type (unstructured/structured) may need to be specified if the same options exist within both self.structured_options and self.unstructured_options
- Parameters
options (dict) – Dictionary of options to set
- Return
None