Profiler Options¶
coding=utf-8 Specify the options when running the data profiler.
-
class
dataprofiler.profilers.profiler_options.
BaseOption
¶ Bases:
object
-
property
properties
¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set
(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate
(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
property
-
class
dataprofiler.profilers.profiler_options.
BooleanOption
(is_enabled=True)¶ Bases:
dataprofiler.profilers.profiler_options.BaseOption
Boolean option
- Variables
is_enabled (bool) – boolean option to enable/disable the option.
-
property
properties
¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set
(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate
(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.
HistogramOption
(is_enabled=True, bin_count_or_method='auto')¶ Bases:
dataprofiler.profilers.profiler_options.BooleanOption
Options for histograms
- Variables
is_enabled (bool) – boolean option to enable/disable the option.
bin_count_or_method (Union[str, int, list(str)]) – bin count or the method with which to calculate histograms
-
property
properties
¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set
(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate
(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.
BaseInspectorOptions
(is_enabled=True)¶ Bases:
dataprofiler.profilers.profiler_options.BooleanOption
Base options for all the columns.
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
-
is_prop_enabled
(prop)¶ Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
-
property
properties
¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set
(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate
(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.
NumericalOptions
¶ Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptions
Options for the Numerical Stats Mixin
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
min (BooleanOption) – boolean option to enable/disable min
max (BooleanOption) – boolean option to enable/disable max
sum (BooleanOption) – boolean option to enable/disable sum
variance (BooleanOption) – boolean option to enable/disable variance
histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles
is_numeric_stats_enabled (bool) – boolean to enable/disable all numeric stats
-
property
is_numeric_stats_enabled
¶ Returns the state of numeric stats being enabled / disabled. If any numeric stats property is enabled it will return True, otherwise it will return False.
- Returns
true if any numeric stats property is enabled, otherwise false
- Rtype bool
-
property
properties
¶ Includes at least: is_enabled: Turns on or off the column.
-
is_prop_enabled
(prop)¶ Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
-
set
(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate
(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.
IntOptions
¶ Bases:
dataprofiler.profilers.profiler_options.NumericalOptions
Options for the Int Column
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
min (BooleanOption) – boolean option to enable/disable min
max (BooleanOption) – boolean option to enable/disable max
sum (BooleanOption) – boolean option to enable/disable sum
variance (BooleanOption) – boolean option to enable/disable variance
histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles
is_numeric_stats_enabled (bool) – boolean to enable/disable all numeric stats
-
property
is_numeric_stats_enabled
¶ Returns the state of numeric stats being enabled / disabled. If any numeric stats property is enabled it will return True, otherwise it will return False.
- Returns
true if any numeric stats property is enabled, otherwise false
- Rtype bool
-
is_prop_enabled
(prop)¶ Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
-
property
properties
¶ Includes at least: is_enabled: Turns on or off the column.
-
set
(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate
(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.
PrecisionOptions
(is_enabled=True, sample_ratio=None)¶ Bases:
dataprofiler.profilers.profiler_options.BooleanOption
Options for precision
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
sample_ratio (float) – float option to determine ratio of valid float samples in determining percision. This ratio will override any defaults.
-
property
properties
¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set
(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate
(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.
FloatOptions
¶ Bases:
dataprofiler.profilers.profiler_options.NumericalOptions
Options for the Float Column.
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
min (BooleanOption) – boolean option to enable/disable min
max (BooleanOption) – boolean option to enable/disable max
sum (BooleanOption) – boolean option to enable/disable sum
variance (BooleanOption) – boolean option to enable/disable variance
histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles
is_numeric_stats_enabled (bool) – boolean to enable/disable all numeric stats
-
property
is_numeric_stats_enabled
¶ Returns the state of numeric stats being enabled / disabled. If any numeric stats property is enabled it will return True, otherwise it will return False.
- Returns
true if any numeric stats property is enabled, otherwise false
- Rtype bool
-
is_prop_enabled
(prop)¶ Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
-
property
properties
¶ Includes at least: is_enabled: Turns on or off the column.
-
set
(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate
(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.
TextOptions
¶ Bases:
dataprofiler.profilers.profiler_options.NumericalOptions
Options for the Text Column:
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
vocab (BooleanOption) – boolean option to enable/disable vocab
min (BooleanOption) – boolean option to enable/disable min
max (BooleanOption) – boolean option to enable/disable max
sum (BooleanOption) – boolean option to enable/disable sum
variance (BooleanOption) – boolean option to enable/disable variance
histogram_and_quantiles (BooleanOption) – boolean option to enable/disable histogram_and_quantiles
is_numeric_stats_enabled (bool) – boolean to enable/disable all numeric stats
-
property
is_numeric_stats_enabled
¶ Returns the state of numeric stats being enabled / disabled. If any numeric stats property is enabled it will return True, otherwise it will return False.
- Returns
true if any numeric stats property is enabled, otherwise false
- Rtype bool
-
is_prop_enabled
(prop)¶ Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
-
property
properties
¶ Includes at least: is_enabled: Turns on or off the column.
-
set
(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate
(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.
DateTimeOptions
¶ Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptions
Options for the Datetime Column
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
-
is_prop_enabled
(prop)¶ Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
-
property
properties
¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set
(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate
(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.
OrderOptions
¶ Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptions
Options for the Order Column
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
-
is_prop_enabled
(prop)¶ Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
-
property
properties
¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set
(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate
(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.
CategoricalOptions
¶ Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptions
Options for the Categorical Column
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
-
is_prop_enabled
(prop)¶ Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
-
property
properties
¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set
(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate
(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.
DataLabelerOptions
¶ Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptions
Options for the Data Labeler Column.
- Variables
is_enabled (bool) – boolean option to enable/disable the column.
data_labeler_dirpath (str) – String to load data labeler from
max_sample_size (int) – Int to decide sample size
-
property
properties
¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
is_prop_enabled
(prop)¶ Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
-
set
(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate
(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.
TextProfilerOptions
(is_enabled=True, is_case_sensitive=True, stop_words=None)¶ Bases:
dataprofiler.profilers.profiler_options.BaseInspectorOptions
Constructs the TextProfilerOption object with default values.
- Variables
is_enabled (bool) – boolean option to enable/disable the option.
is_case_sensitive (bool) – option set for case sensitivity.
stop_words (Union[None, list(str)]) – option set for stop words.
words (BooleanOption) – option set for word update.
vocab (BooleanOption) – option set for vocab update.
-
is_prop_enabled
(prop)¶ Checks to see if a property is enabled or not and returns boolean.
- Parameters
prop (String) – The option to check if it is enabled
- Returns
Whether or not the property is enabled
- Return type
Boolean
-
property
properties
¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set
(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate
(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.
StructuredOptions
¶ Bases:
dataprofiler.profilers.profiler_options.BaseOption
Constructs the StructuredOptions object with default values.
- Variables
int (IntOptions) – option set for int profiling.
float (FloatOptions) – option set for float profiling.
datetime (DateTimeOptions) – option set for datetime profiling.
text (TextOptions) – option set for text profiling.
order (OrderOptions) – option set for order profiling.
category (CategoricalOptions) – option set for category profiling.
data_labeler (DataLabelerOptions) – option set for data_labeler profiling.
-
property
enabled_profiles
¶ Returns a list of the enabled profilers for columns.
-
property
properties
¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set
(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate
(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.
UnstructuredOptions
¶ Bases:
dataprofiler.profilers.profiler_options.BaseOption
Constructs the UnstructuredOptions object with default values.
- Variables
text (TextProfilerOptions) – option set for text profiling.
data_labeler (DataLabelerOptions) – option set for data_labeler profiling.
-
property
enabled_profiles
¶ Returns a list of the enabled profilers.
-
property
properties
¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set
(options)¶ Set all the options. Send in a dict that contains all of or a subset of the appropriate options. Set the values of the options. Will raise error if the formatting is improper.
- Parameters
options (dict) – dict containing the options you want to set.
- Returns
None
-
validate
(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)
-
class
dataprofiler.profilers.profiler_options.
ProfilerOptions
¶ Bases:
dataprofiler.profilers.profiler_options.BaseOption
Initializes the ProfilerOptions object.
- Variables
structured_options (StructuredOptions) – option set for structured dataset profiling.
unstructured_options (UnstructuredOptions) – option set for unstructured dataset profiling.
-
property
properties
¶ Returns a copy of the option properties.
- Returns
dictionary of the option’s properties attr: value
- Return type
dict
-
set
(options)¶ Overwrites BaseOption.set since the type (unstructured/structured) may need to be specified if the same options exist within both self.structured_options and self.unstructured_options
- Parameters
options (dict) – Dictionary of options to set
- Return
None
-
validate
(raise_error=True)¶ Validates the options do not conflict and cause errors. Raises error/warning if so.
- Parameters
raise_error (bool) – Flag that raises errors if true. Returns errors if false.
- Returns
list of errors (if raise_error is false)
- Return type
list(str)