Categorical Column Profile¶
-
class
dataprofiler.profilers.categorical_column_profile.
CategoricalColumn
(name, options=None)¶ Bases:
dataprofiler.profilers.base_column_profilers.BaseColumnProfiler
Categorical column profile subclass of BaseColumnProfiler. Represents a column int the dataset which is a categorical column.
Initialization of column base properties and itself.
- Parameters
name (String) – Name of data
-
type
= 'category'¶
-
diff
(other_profile, options=None)¶ Finds the differences for CategoricalColumns.
- Parameters
other_profile (CategoricalColumn) – profile to find the difference with
- Returns
the CategoricalColumn differences
- Return type
dict
-
property
profile
¶ Property for profile. Returns the profile of the column. For categorical_count, it will display the top k categories most frequently occurred in descending order.
-
property
categories
¶ Property for categories.
-
property
unique_ratio
¶ Property for unique_ratio. Returns ratio of unique categories to sample_size
-
property
is_match
¶ Property for is_match. Returns true if column is categorical.
-
update
(df_series)¶ Updates the column profile.
- Parameters
df_series (pandas.core.series.Series) – Data to profile.
- Returns
None
-
property
gini_impurity
¶ Property for Gini Impurity. Gini Impurity is a way to calculate likelihood of an incorrect classification of a new instance of a random variable.
G = Σ(i=1; J): P(i) * (1 - P(i)), where i is the category classes. We are traversing through categories and calculating with the column
- Returns
None or Gini Impurity probability
-
property
unalikeability
¶ Property for Unlikeability. Unikeability checks for “how often observations differ from one another” Reference: Perry, M. and Kader, G. Variation as Unalikeability. Teaching Statistics, Vol. 27, No. 2 (2005), pp. 58-60.
U = Σ(i=1,n)Σ(j=1,n): (Cij)/(n**2-n) Cij = 1 if i!=j, 0 if i=j
- Returns
None or unlikeability probability
-
col_type
= None¶