dataprofiler.profilers.numerical_column_stats module

Build model for dataset by identifying col type along with its respective params.

class dataprofiler.profilers.numerical_column_stats.abstractstaticmethod(function: Callable)

Bases: staticmethod

For making function an abstract method.

Initialize abstract static method.

class dataprofiler.profilers.numerical_column_stats.NumericStatsMixin(options: NumericalOptions | None = None)

Bases: BaseColumnProfiler[NumericStatsMixinT]

Abstract numerical column profile subclass of BaseColumnProfiler.

Represents column in the dataset which is a text column. Has Subclasses itself.

Initialize column base properties and itself.

Parameters:

options (NumericalOptions) – Options for the numerical stats.

type: str | None = None
profile() dict

Return profile of the column.

Returns:

report(remove_disabled_flag: bool = False) dict

Call the profile and remove the disabled columns from profile’s report.

“Disabled column” is defined as a column that is not present in self.__calculations but is present in the self.profile.

Variables:

remove_disabled_flag – true/false value to tell the code to remove values missing in __calculations

Returns:

Profile object pop’d based on values missing from __calculations

Return type:

Profile

diff(other_profile: NumericStatsMixinT, options: dict | None = None) dict

Find the differences for several numerical stats.

Parameters:

other_profile (NumericStatsMixin Profile) – profile to find the difference with

Returns:

the numerical stats differences

Return type:

dict

property mean: float | np.float64

Return mean value.

property mode: list[float]

Find an estimate for the mode[s] of the data.

Returns:

the mode(s) of the data

Return type:

list(float)

property median: float

Estimate the median of the data.

Returns:

the median

Return type:

float

property variance: float | np.float64

Return variance.

property stddev: float | np.float64

Return stddev value.

property skewness: float | np.float64

Return skewness value.

property kurtosis: float | np.float64

Return kurtosis value.

property median_abs_deviation: float | np.float64

Get median absolute deviation estimated from the histogram of the data.

Subtract bin edges from the median value Fold the histogram to positive and negative parts around zero Impose the two bin edges from the two histogram Calculate the counts for the two histograms with the imposed bin edges Superimpose the counts from the two histograms Interpolate the median absolute deviation from the superimposed counts

Returns:

median absolute deviation

col_type = None
classmethod load_from_dict(data: dict[str, Any], config: dict | None = None) BaseColumnProfilerT

Parse attribute from json dictionary into self.

Parameters:
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config for loading column profiler params from dictionary

Returns:

Profiler with attributes populated.

Return type:

BaseColumnProfiler

name: str | None
sample_size: int
metadata: dict
times: dict
thread_safe: bool
abstract update(df_series: Series) NumericStatsMixin

Update the numerical profile properties with an uncleaned dataset.

Parameters:

df_series (pandas.core.series.Series) – df series with nulls removed

Returns:

None

static is_float(x: str) bool

Return True if x is float.

For “0.80” this function returns True For “1.00” this function returns True For “1” this function returns True

Parameters:

x (str) – string to test

Returns:

if is float or not

Return type:

bool

static is_int(x: str) bool

Return True if x is integer.

For “0.80” This function returns False For “1.00” This function returns True For “1” this function returns True

Parameters:

x (str) – string to test

Returns:

if is integer or not

Return type:

bool

static np_type_to_type(val: Any) Any

Convert numpy variables to base python type variables.

Parameters:

val (numpy type or base type) – value to check & change

Return val:

base python type

Rtype val:

int or float