Data Labeler Column Profile

class dataprofiler.profilers.data_labeler_column_profile.DataLabelerColumn(name, options=None)

Bases: dataprofiler.profilers.base_column_profilers.BaseColumnProfiler

Initialization of Data Label profiling for structured datasets.

Parameters
  • data_labeler_dirpath (String) – Directory path to the data labeler

  • options (DataLabelerOptions) – Options for the data labeler column

col_type = 'data_labeler'
static assert_equal_conditions(data_labeler, data_labeler2)

Ensures data labelers have the same values. Raises error otherwise.

Parameters
Returns

None

property data_label

Returns the data labels which best fit the data it has seen based on the DataLabeler used. Data labels must be within the minimum probability differential of the top predicted value. If nothing is more than minimum top label value, it says it could not determine the data label.

property avg_predictions

Averages all sample predictions for each data label.

property label_representation

Representation of label found within the dataset based on ranked voting. When top_k=1, this is simply the distribution of data labels found within the dataset.

property profile

Property for profile. Returns the profile of the column.

update(df_series)

Updates the column profile.

Parameters

df_series (pandas.core.series.Series) – df series

Returns

None