Profiler Options¶
coding=utf-8 Specify the options when running the data profiler.
- class dataprofiler.profilers.profiler_options.BaseOption¶
Bases:
object
- property properties¶
Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options)¶
Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error=True)¶
Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.BooleanOption(is_enabled=True)¶
Bases:
dataprofiler.profilers.profiler_options.BaseOption
Boolean option
- Variables
is_enabled (bool) – boolean option to enable/disable the option.
- property properties¶
Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options)¶
Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error=True)¶
Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.HistogramOption(is_enabled=True, bin_count_or_method='auto')¶
Bases:
dataprofiler.profilers.profiler_options.BooleanOption
Options for histograms
- Variables
is_enabled (bool) – boolean option to enable/disable the option.
bin_count_or_method (Union[str, int, list(str)]) – bin count or the method with which to calculate histograms
- property properties¶
Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options)¶
Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error=True)¶
Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.BaseInspectorOptions(is_enabled=True)¶
Bases:
dataprofiler.profilers.profiler_options.BooleanOption
Base options for all the columns.
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
- is_prop_enabled(prop)¶
Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- property properties¶
Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options)¶
Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error=True)¶
Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.NumericalOptions¶
Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptions
Options for the Numerical Stats Mixin
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
min (BooleanOption) – boolean option to enable/disable min
max (BooleanOption) – boolean option to enable/disable max
sum (BooleanOption) – boolean option to enable/disable sum
variance (BooleanOption) – boolean option to enable/disable variance
skewness (BooleanOption) – boolean option to enable/disable skewness
kurtosis (BooleanOption) – boolean option to enable/disable kurtosis
histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles
:ivar bias_correction : boolean option to enable/disable existence of bias :vartype bias: BooleanOption :ivar num_zeros: boolean option to enable/disable num_zeros :vartype num_zeros: BooleanOption :ivar num_negatives: boolean option to enable/disable num_negatives :vartype num_negatives: BooleanOption :ivar is_numeric_stats_enabled: boolean to enable/disable all numeric
stats
- property is_numeric_stats_enabled¶
Returns the state of numeric stats being enabled / disabled. If any numeric stats property is enabled it will return True, otherwise it will return False.
- Returns
true if any numeric stats property is enabled, otherwise false
- Rtype bool
- property properties¶
Includes at least: is_enabled: Turns on or off the column.
- is_prop_enabled(prop)¶
Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- set(options)¶
Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error=True)¶
Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.IntOptions¶
Bases:
dataprofiler.profilers.profiler_options.NumericalOptions
Options for the Int Column
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
min (BooleanOption) – boolean option to enable/disable min
max (BooleanOption) – boolean option to enable/disable max
sum (BooleanOption) – boolean option to enable/disable sum
variance (BooleanOption) – boolean option to enable/disable variance
skewness (BooleanOption) – boolean option to enable/disable skewness
kurtosis (BooleanOption) – boolean option to enable/disable kurtosis
histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles
:ivar bias_correction : boolean option to enable/disable existence of bias :vartype bias: BooleanOption :ivar num_zeros: boolean option to enable/disable num_zeros :vartype num_zeros: BooleanOption :ivar num_negatives: boolean option to enable/disable num_negatives :vartype num_negatives: BooleanOption :ivar is_numeric_stats_enabled: boolean to enable/disable all numeric
stats
- property is_numeric_stats_enabled¶
Returns the state of numeric stats being enabled / disabled. If any numeric stats property is enabled it will return True, otherwise it will return False.
- Returns
true if any numeric stats property is enabled, otherwise false
- Rtype bool
- is_prop_enabled(prop)¶
Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- property properties¶
Includes at least: is_enabled: Turns on or off the column.
- set(options)¶
Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error=True)¶
Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.PrecisionOptions(is_enabled=True, sample_ratio=None)¶
Bases:
dataprofiler.profilers.profiler_options.BooleanOption
Options for precision
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
sample_ratio (float) – float option to determine ratio of valid float samples in determining percision. This ratio will override any defaults.
- property properties¶
Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options)¶
Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error=True)¶
Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.FloatOptions¶
Bases:
dataprofiler.profilers.profiler_options.NumericalOptions
Options for the Float Column.
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
min (BooleanOption) – boolean option to enable/disable min
max (BooleanOption) – boolean option to enable/disable max
sum (BooleanOption) – boolean option to enable/disable sum
variance (BooleanOption) – boolean option to enable/disable variance
skewness (BooleanOption) – boolean option to enable/disable skewness
kurtosis (BooleanOption) – boolean option to enable/disable kurtosis
histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles
:ivar bias_correction : boolean option to enable/disable existence of bias :vartype bias: BooleanOption :ivar num_zeros: boolean option to enable/disable num_zeros :vartype num_zeros: BooleanOption :ivar num_negatives: boolean option to enable/disable num_negatives :vartype num_negatives: BooleanOption :ivar is_numeric_stats_enabled: boolean to enable/disable all numeric
stats
- property is_numeric_stats_enabled¶
Returns the state of numeric stats being enabled / disabled. If any numeric stats property is enabled it will return True, otherwise it will return False.
- Returns
true if any numeric stats property is enabled, otherwise false
- Rtype bool
- is_prop_enabled(prop)¶
Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- property properties¶
Includes at least: is_enabled: Turns on or off the column.
- set(options)¶
Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error=True)¶
Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.TextOptions¶
Bases:
dataprofiler.profilers.profiler_options.NumericalOptions
Options for the Text Column:
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
vocab (BooleanOption) – boolean option to enable/disable vocab
min (BooleanOption) – boolean option to enable/disable min
max (BooleanOption) – boolean option to enable/disable max
sum (BooleanOption) – boolean option to enable/disable sum
variance (BooleanOption) – boolean option to enable/disable variance
skewness (BooleanOption) – boolean option to enable/disable skewness
kurtosis (BooleanOption) – boolean option to enable/disable kurtosis
:ivar bias_correction : boolean option to enable/disable existence of bias :vartype bias: BooleanOption :ivar histogram_and_quantiles: boolean option to enable/disable
histogram_and_quantiles
- Variables
num_zeros (BooleanOption) – boolean option to enable/disable num_zeros
num_negatives (BooleanOption) – boolean option to enable/disable num_negatives
is_numeric_stats_enabled (bool) – boolean to enable/disable all numeric stats
- property is_numeric_stats_enabled¶
Returns the state of numeric stats being enabled / disabled. If any numeric stats property is enabled it will return True, otherwise it will return False. Although it seems redundant, this method is needed in order for the function below, the setter function also called is_numeric_stats_enabled, to properly work.
- Returns
true if any numeric stats property is enabled, otherwise false
- Rtype bool
- is_prop_enabled(prop)¶
Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- property properties¶
Includes at least: is_enabled: Turns on or off the column.
- set(options)¶
Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error=True)¶
Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.DateTimeOptions¶
Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptions
Options for the Datetime Column
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
- is_prop_enabled(prop)¶
Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- property properties¶
Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options)¶
Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error=True)¶
Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.OrderOptions¶
Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptions
Options for the Order Column
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
- is_prop_enabled(prop)¶
Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- property properties¶
Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options)¶
Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error=True)¶
Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.CategoricalOptions(is_enabled=True, top_k_categories=None)¶
Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptions
Options for the Categorical Column
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
top_k_categories ([None, int]) – number of categories to be displayed when called
- is_prop_enabled(prop)¶
Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- property properties¶
Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options)¶
Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error=True)¶
Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.CorrelationOptions(is_enabled=False, columns=None)¶
Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptions
Options for the Correlation between Columns
- Variables
is_enabled (bool) – boolean option to enable/disable.
columns (list()) – Columns considered to calculate correlation
- is_prop_enabled(prop)¶
Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- property properties¶
Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options)¶
Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error=True)¶
Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.DataLabelerOptions¶
Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptions
Options for the Data Labeler Column.
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
data_labeler_dirpath (str) – String to load data labeler from
max_sample_size (BaseDataLabeler) – Int to decide sample size
data_labeler_object – DataLabeler object used in profiler
- property properties¶
Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- is_prop_enabled(prop)¶
Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- set(options)¶
Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error=True)¶
Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.TextProfilerOptions(is_enabled=True, is_case_sensitive=True, stop_words=None, top_k_chars=None, top_k_words=None)¶
Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptions
Constructs the TextProfilerOption object with default values.
- Variables
is_enabled (bool) – boolean option to enable/disable the option.
is_case_sensitive (bool) – option set for case sensitivity.
stop_words (Union[None, list(str)]) – option set for stop words.
top_k_chars (Union[None, int]) – option set for number of top common characters.
top_k_words (Union[None, int]) – option set for number of top common words.
words (BooleanOption) – option set for word update.
vocab (BooleanOption) – option set for vocab update.
- is_prop_enabled(prop)¶
Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
- property properties¶
Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options)¶
Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error=True)¶
Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.StructuredOptions(null_values=None)¶
Bases:
dataprofiler.profilers.profiler_options.BaseOption
Constructs the StructuredOptions object with default values.
- Parameters
null_values – null values we input.
- Variables
int (IntOptions) – option set for int profiling.
float (FloatOptions) – option set for float profiling.
datetime (DateTimeOptions) – option set for datetime profiling.
text (TextOptions) – option set for text profiling.
order (OrderOptions) – option set for order profiling.
category (CategoricalOptions) – option set for category profiling.
data_labeler (DataLabelerOptions) – option set for data_labeler profiling.
correlation (CorrelationOptions) – option set for correlation profiling.
null_values (Union[None, dict]) – option set for defined null values
- property enabled_profiles¶
Returns a list of the enabled profilers for columns.
- property properties¶
Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options)¶
Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error=True)¶
Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.UnstructuredOptions¶
Bases:
dataprofiler.profilers.profiler_options.BaseOption
Constructs the UnstructuredOptions object with default values.
- Variables
text (TextProfilerOptions) – option set for text profiling.
data_labeler (DataLabelerOptions) – option set for data_labeler profiling.
- property enabled_profiles¶
Returns a list of the enabled profilers.
- property properties¶
Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- set(options)¶
Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
- validate(raise_error=True)¶
Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- class dataprofiler.profilers.profiler_options.ProfilerOptions¶
Bases:
dataprofiler.profilers.profiler_options.BaseOption
Initializes the ProfilerOptions object.
- Variables
structured_options (StructuredOptions) – option set for structured dataset profiling.
unstructured_options (UnstructuredOptions) – option set for unstructured dataset profiling.
- property properties¶
Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
- validate(raise_error=True)¶
Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
- set(options)¶
Overwrites BaseOption.set since the type (unstructured/structured) may need to be specified if the same options exist within both self.structured_options and self.unstructured_options
- Parameters
options (dict) – Dictionary of options to set
- Return
None