dataprofiler.profilers.numerical_column_stats module¶
Build model for dataset by identifying col type along with its respective params.
- class dataprofiler.profilers.numerical_column_stats.abstractstaticmethod(function: Callable)¶
Bases:
staticmethod
For making function an abstract method.
Initialize abstract static method.
- class dataprofiler.profilers.numerical_column_stats.NumericStatsMixin(options: NumericalOptions | None = None)¶
Bases:
BaseColumnProfiler
[NumericStatsMixinT
]Abstract numerical column profile subclass of BaseColumnProfiler.
Represents column in the dataset which is a text column. Has Subclasses itself.
Initialize column base properties and itself.
- Parameters:
options (NumericalOptions) – Options for the numerical stats.
- type: str | None = None¶
- profile() dict ¶
Return profile of the column.
- Returns:
- report(remove_disabled_flag: bool = False) dict ¶
Call the profile and remove the disabled columns from profile’s report.
“Disabled column” is defined as a column that is not present in self.__calculations but is present in the self.profile.
- Variables:
remove_disabled_flag – true/false value to tell the code to remove values missing in __calculations
- Returns:
Profile object pop’d based on values missing from __calculations
- Return type:
Profile
- diff(other_profile: NumericStatsMixinT, options: dict | None = None) dict ¶
Find the differences for several numerical stats.
- Parameters:
other_profile (NumericStatsMixin Profile) – profile to find the difference with
- Returns:
the numerical stats differences
- Return type:
dict
- property mean: float | np.float64¶
Return mean value.
- property mode: list[float]¶
Find an estimate for the mode[s] of the data.
- Returns:
the mode(s) of the data
- Return type:
list(float)
- property median: float¶
Estimate the median of the data.
- Returns:
the median
- Return type:
float
- property variance: float | np.float64¶
Return variance.
- property stddev: float | np.float64¶
Return stddev value.
- property skewness: float | np.float64¶
Return skewness value.
- property kurtosis: float | np.float64¶
Return kurtosis value.
- property median_abs_deviation: float | np.float64¶
Get median absolute deviation estimated from the histogram of the data.
Subtract bin edges from the median value Fold the histogram to positive and negative parts around zero Impose the two bin edges from the two histogram Calculate the counts for the two histograms with the imposed bin edges Superimpose the counts from the two histograms Interpolate the median absolute deviation from the superimposed counts
- Returns:
median absolute deviation
- col_type = None¶
- classmethod load_from_dict(data: dict[str, Any], config: dict | None = None) BaseColumnProfilerT ¶
Parse attribute from json dictionary into self.
- Parameters:
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config for loading column profiler params from dictionary
- Returns:
Profiler with attributes populated.
- Return type:
- name: str | None¶
- sample_size: int¶
- metadata: dict¶
- times: dict¶
- thread_safe: bool¶
- abstract update(df_series: Series) NumericStatsMixin ¶
Update the numerical profile properties with an uncleaned dataset.
- Parameters:
df_series (pandas.core.series.Series) – df series with nulls removed
- Returns:
None
- static is_float(x: str) bool ¶
Return True if x is float.
For “0.80” this function returns True For “1.00” this function returns True For “1” this function returns True
- Parameters:
x (str) – string to test
- Returns:
if is float or not
- Return type:
bool
- static is_int(x: str) bool ¶
Return True if x is integer.
For “0.80” This function returns False For “1.00” This function returns True For “1” this function returns True
- Parameters:
x (str) – string to test
- Returns:
if is integer or not
- Return type:
bool
- static np_type_to_type(val: Any) Any ¶
Convert numpy variables to base python type variables.
- Parameters:
val (numpy type or base type) – value to check & change
- Return val:
base python type
- Rtype val:
int or float