Categorical Column Profile¶
Contains class for categorical column profiler.
- class dataprofiler.profilers.categorical_column_profile.CategoricalColumn(name: str | None, options: CategoricalOptions = None)¶
Bases:
dataprofiler.profilers.base_column_profilers.BaseColumnProfiler
[CategoricalColumn
]Categorical column profile subclass of BaseColumnProfiler.
Represents a column int the dataset which is a categorical column.
Initialize column base properties and itself.
- Parameters
name (String) – Name of data
- type = 'category'¶
- property gini_impurity: float | None¶
Return Gini Impurity.
Gini Impurity is a way to calculate likelihood of an incorrect classification of a new instance of a random variable.
G = Σ(i=1; J): P(i) * (1 - P(i)), where i is the category classes. We are traversing through categories and calculating with the column
- Returns
None or Gini Impurity probability
- property unalikeability: float | None¶
Return Unlikeability.
Unikeability checks for “how often observations differ from one another” Reference: Perry, M. and Kader, G. Variation as Unalikeability. Teaching Statistics, Vol. 27, No. 2 (2005), pp. 58-60.
U = Σ(i=1,n)Σ(j=1,n): (Cij)/(n**2-n) Cij = 1 if i!=j, 0 if i=j
- Returns
None or unlikeability probability
- diff(other_profile: dataprofiler.profilers.categorical_column_profile.CategoricalColumn, options: Optional[dict] = None) dict ¶
Find the differences for CategoricalColumns.
- Parameters
other_profile (CategoricalColumn) – profile to find the difference with
- Returns
the CategoricalColumn differences
- Return type
dict
- report(remove_disabled_flag: bool = False) dict ¶
Return report.
This is a private abstract method.
- Parameters
remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in the report.
- classmethod load_from_dict(data: dict, config: dict | None = None)¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config for loading column profiler params from dictionary
- Returns
Profiler with attributes populated.
- Return type
- property profile: dict¶
Return the profile of the column.
For categorical_count, it will display the top k categories most frequently occurred in descending order.
- property categories: list[str]¶
Return categories.
- property categorical_counts: dict[str, int]¶
Return counts of each category.
- property unique_ratio: float¶
Return ratio of unique categories to sample_size.
- property unique_count: int¶
Return ratio of unique categories to sample_size.
- property is_match: bool¶
Return true if column is categorical.
- col_type = None¶
- name: str | None¶
- sample_size: int¶
- metadata: dict¶
- times: dict¶
- thread_safe: bool¶
- update(df_series: pandas.core.series.Series) dataprofiler.profilers.categorical_column_profile.CategoricalColumn ¶
Update the column profile.
- Parameters
df_series (pandas.core.series.Series) – Data to profile.
- Returns
updated CategoricalColumn
- Return type