dataprofiler.profilers.data_labeler_column_profile module

Contains class for for profiling data labeler col.

class dataprofiler.profilers.data_labeler_column_profile.DataLabelerColumn(name: str | None, options: DataLabelerOptions = None)

Bases: BaseColumnProfiler[DataLabelerColumn]

Sublass of BaseColumnProfiler for profiling data labeler col.

Initialize Data Label profiling for structured datasets.

Parameters:
  • name (String) – name of column being profiled

  • options (DataLabelerOptions) – Options for the data labeler column

type = 'data_labeler'
thread_safe: bool
static assert_equal_conditions(data_labeler: DataLabelerColumn, data_labeler2: DataLabelerColumn) None

Ensure data labelers have the same values. Raise error otherwise.

Parameters:
Returns:

None

property reverse_label_mapping: dict

Return reverse label mapping.

property possible_data_labels: list[str]

Return possible data labels.

property rank_distribution: dict[str, int]

Return rank distribution.

property sum_predictions: ndarray

Sum predictions.

property data_label: str | None

Return data labels which best fit data it has seen based on DataLabeler used.

Data labels must be within the minimum probability differential of the top predicted value. If nothing is more than minimum top label value, it says it could not determine the data label.

property avg_predictions: dict[str, float] | None

Average all sample predictions for each data label.

property label_representation: dict[str, float] | None

Represent label found within the dataset based on ranked voting.

When top_k=1, this is simply the distribution of data labels found within the dataset.

property profile: dict

Return the profile of the column.

classmethod load_from_dict(data, config: dict | None = None) DataLabelerColumn

Parse attribute from json dictionary into self.

Parameters:
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config for loading column profiler params from dictionary

Returns:

Profiler with attributes populated.

Return type:

DataLabelerColumn

report(remove_disabled_flag: bool = False) dict

Return report.

Private abstract method.

Parameters:

remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in the report.

col_type = None
diff(other_profile: DataLabelerColumn, options: dict | None = None) dict

Generate differences between the orders of two DataLabeler columns.

Returns:

Dict containing the differences between orders in their

appropriate output formats :rtype: dict

name: str | None
sample_size: int
metadata: dict
times: dict
update(df_series: Series) DataLabelerColumn

Update the column profile.

Parameters:

df_series (pandas.core.series.Series) – df series

Returns:

updated DataLabelerColumn

Return type:

DataLabelerColumn