Data Labeler Column Profile¶

Contains class for for profiling data labeler col.

class dataprofiler.profilers.data_labeler_column_profile.DataLabelerColumn(name: str | None, options: DataLabelerOptions = None)¶

Bases: dataprofiler.profilers.base_column_profilers.BaseColumnProfiler

Sublass of BaseColumnProfiler for profiling data labeler col.

Initialize Data Label profiling for structured datasets.

Parameters

name (String) – name of column being profiled
options (DataLabelerOptions) – Options for the data labeler column

type = 'data_labeler'¶

thread_safe: bool¶

static assert_equal_conditions(data_labeler: dataprofiler.profilers.data_labeler_column_profile.DataLabelerColumn, data_labeler2: dataprofiler.profilers.data_labeler_column_profile.DataLabelerColumn) → None¶

Ensure data labelers have the same values. Raise error otherwise.

Parameters

data_labeler (DataLabelerColumn) – first data_labeler
data_labeler2 (DataLabelerColumn) – second data_labeler

Returns

None

property reverse_label_mapping: dict¶: Return reverse label mapping.

property possible_data_labels: list¶: Return possible data labels.

property rank_distribution: dict¶: Return rank distribution.

property sum_predictions: numpy.ndarray¶: Sum predictions.

property data_label: str | None¶

Return data labels which best fit data it has seen based on DataLabeler used.

Data labels must be within the minimum probability differential of the top predicted value. If nothing is more than minimum top label value, it says it could not determine the data label.

property avg_predictions: dict[str, float] | None¶: Average all sample predictions for each data label.

property label_representation: dict[str, float] | None¶

Represent label found within the dataset based on ranked voting.

When top_k=1, this is simply the distribution of data labels found within the dataset.

property profile: dict¶: Return the profile of the column.

report(remove_disabled_flag: bool = False) → dict¶

Return report.

Private abstract method.

Parameters: remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in the report.

col_type = None¶

diff(other_profile: dataprofiler.profilers.data_labeler_column_profile.DataLabelerColumn, options: Optional[dict] = None) → dict¶

Generate differences between the orders of two DataLabeler columns.

Returns: Dict containing the differences between orders in their

appropriate output formats :rtype: dict

name: str | None¶

sample_size: int¶

metadata: dict¶

times: dict¶

update(df_series: pandas.core.series.Series) → dataprofiler.profilers.data_labeler_column_profile.DataLabelerColumn¶

Update the column profile.

Parameters: df_series (pandas.core.series.Series) – df series
Returns: updated DataLabelerColumn
Return type: DataLabelerColumn