Numerical Column Stats

Build model for dataset by identifying col type along with its respective params.

class dataprofiler.profilers.numerical_column_stats.abstractstaticmethod(function)

Bases: staticmethod

For making function an abstract method.

Initialize abstract static method.

class dataprofiler.profilers.numerical_column_stats.NumericStatsMixin(options=None)

Bases: object

Abstract numerical column profile subclass of BaseColumnProfiler.

Represents column in the dataset which is a text column. Has Subclasses itself.

Initialize column base properties and itself.

Parameters

options (NumericalOptions) – Options for the numerical stats.

type = None
profile()

Return profile of the column.

Returns

report(remove_disabled_flag=False)

Call the profile and remove the disabled columns from profile’s report.

“Disabled column” is defined as a column that is not present in self.__calculations but is present in the self.profile.

Variables

remove_disabled_flag – true/false value to tell the code to remove values missing in __calculations

Returns

Profile object pop’d based on values missing from __calculations

Return type

Profile

diff(other_profile, options=None)

Find the differences for several numerical stats.

Parameters

other_profile (NumericStatsMixin Profile) – profile to find the difference with

Returns

the numerical stats differences

Return type

dict

property mean

Return mean value.

property mode

Find an estimate for the mode[s] of the data.

Returns

the mode(s) of the data

Return type

list(float)

property median

Estimate the median of the data.

Returns

the median

Return type

float

property variance

Return variance.

property stddev

Return stddev value.

property skewness

Return skewness value.

property kurtosis

Return kurtosis value.

property median_abs_deviation

Get median absolute deviation estimated from the histogram of the data.

Subtract bin edges from the median value Fold the histogram to positive and negative parts around zero Impose the two bin edges from the two histogram Calculate the counts for the two histograms with the imposed bin edges Superimpose the counts from the two histograms Interpolate the median absolute deviation from the superimposed counts

Returns

median absolute deviation

abstract update(df_series)

Update the numerical profile properties with an uncleaned dataset.

Parameters

df_series (pandas.core.series.Series) – df series with nulls removed

Returns

None

static is_float(x)

Return True if x is float.

For “0.80” this function returns True For “1.00” this function returns True For “1” this function returns True

Parameters

x (str) – string to test

Returns

if is float or not

Return type

bool

static is_int(x)

Return True if x is integer.

For “0.80” This function returns False For “1.00” This function returns True For “1” this function returns True

Parameters

x (str) – string to test

Returns

if is integer or not

Return type

bool

static np_type_to_type(val)

Convert numpy variables to base python type variables.

Parameters

val (numpy type or base type) – value to check & change

Return val

base python type

Rtype val

int or float