Numerical Column Stats¶
coding=utf-8 Build model for a dataset by identifying type of column along with its respective parameters.
- class dataprofiler.profilers.numerical_column_stats.abstractstaticmethod(function)¶
Bases:
staticmethod
- class dataprofiler.profilers.numerical_column_stats.NumericStatsMixin(options=None)¶
Bases:
object
Abstract numerical column profile subclass of BaseColumnProfiler. Represents a column in the dataset which is a text column. Has Subclasses itself.
Initialization of column base properties and itself.
- Parameters
options (NumericalOptions) – Options for the numerical stats.
- type = None¶
- profile()¶
Property for profile. Returns the profile of the column. :return:
- report(remove_disabled_flag=False)¶
- Method to call the profile and remove the disabled columns from
the profile’s report. “Disabled column” is defined as a column that is not present in self.__calculations but is present in the self.profile.
- Variables
remove_disabled_flag – true/false value to tell the code to remove values missing in __calculations
- Returns
Profile object that is pop’d based on values missing from __calculations
- Return type
Profile
- diff(other_profile, options=None)¶
Finds the differences for several numerical stats.
- Parameters
other_profile (NumericStatsMixin Profile) – profile to find the difference with
- Returns
the numerical stats differences
- Return type
dict
- property mean¶
- property mode¶
Finds an estimate for the mode(s) of the data.
- Returns
the mode(s) of the data
- Return type
list(float)
- property median¶
Estimates the median of the data.
- Returns
the median
- Return type
float
- property variance¶
- property stddev¶
- property skewness¶
- property kurtosis¶
- property median_abs_deviation¶
- Get median absolute deviation estimated from the histogram of the data
Subtract bin edges from the median value Fold the histogram to positive and negative parts around zero Impose the two bin edges from the two histogram Calculate the counts for the two histograms with the imposed bin edges Superimpose the counts from the two histograms Interpolate the median absolute deviation from the superimposed counts
- Returns
median absolute deviation
- abstract update(df_series)¶
Abstract Method for updating the numerical profile properties with an uncleaned dataset.
- Parameters
df_series (pandas.core.series.Series) – df series with nulls removed
- Returns
None
- static is_float(x)¶
For “0.80” this function returns True For “1.00” this function returns True For “1” this function returns True
- Parameters
x (str) – string to test
- Returns
if is float or not
- Return type
bool
- static is_int(x)¶
For “0.80” This function returns False For “1.00” This function returns True For “1” this function returns True
- Parameters
x (str) – string to test
- Returns
if is integer or not
- Return type
bool
- static np_type_to_type(val)¶
Converts numpy variables to base python type variables
- Parameters
val (numpy type or base type) – value to check & change
- Return val
base python type
- Rtype val
int or float