dataprofiler.profilers.float_column_profile module

Float profile analysis for individual col within structured profiling.

class dataprofiler.profilers.float_column_profile.FloatColumn(name: str | None, options: FloatOptions = None)

Bases: NumericStatsMixin[FloatColumn], BaseColumnPrimitiveTypeProfiler[FloatColumn]

Float column profile mixin with numerical stats.

Represents a column in the dataset which is a float column.

Initialize column base properties and itself.

Parameters:
  • name (String) – Name of the data

  • options (FloatOptions) – Options for the float column

type: str | None = 'float'
diff(other_profile: FloatColumn, options: dict | None = None) dict

Find the differences for FloatColumns.

Parameters:

other_profile (FloatColumn) – profile to find the difference with

Returns:

the FloatColumn differences

Return type:

dict

report(remove_disabled_flag: bool = False) dict

Report profile attribute of class; potentially pop val from self.profile.

classmethod load_from_dict(data, config: dict | None = None)

Parse attribute from json dictionary into self.

Parameters:
  • data (dict[string, Any]) – dictionary with attributes and values.

  • config (Dict | None) – config for loading column profiler params from dictionary

Returns:

Profiler with attributes populated.

Return type:

FloatColumn

property profile: dict

Return the profile of the column.

Returns:

property precision: dict[str, float | None]

Report statistics on the significant figures of each element in the data.

Returns:

Precision statistics

Return type:

dict

property data_type_ratio: float | None

Calculate the ratio of samples which match this data type.

Returns:

ratio of data type

Return type:

float

col_type = None
static is_float(x: str) bool

Return True if x is float.

For “0.80” this function returns True For “1.00” this function returns True For “1” this function returns True

Parameters:

x (str) – string to test

Returns:

if is float or not

Return type:

bool

static is_int(x: str) bool

Return True if x is integer.

For “0.80” This function returns False For “1.00” This function returns True For “1” this function returns True

Parameters:

x (str) – string to test

Returns:

if is integer or not

Return type:

bool

property kurtosis: float | np.float64

Return kurtosis value.

property mean: float | np.float64

Return mean value.

property median: float

Estimate the median of the data.

Returns:

the median

Return type:

float

property median_abs_deviation: float | np.float64

Get median absolute deviation estimated from the histogram of the data.

Subtract bin edges from the median value Fold the histogram to positive and negative parts around zero Impose the two bin edges from the two histogram Calculate the counts for the two histograms with the imposed bin edges Superimpose the counts from the two histograms Interpolate the median absolute deviation from the superimposed counts

Returns:

median absolute deviation

property mode: list[float]

Find an estimate for the mode[s] of the data.

Returns:

the mode(s) of the data

Return type:

list(float)

static np_type_to_type(val: Any) Any

Convert numpy variables to base python type variables.

Parameters:

val (numpy type or base type) – value to check & change

Return val:

base python type

Rtype val:

int or float

property skewness: float | np.float64

Return skewness value.

property stddev: float | np.float64

Return stddev value.

update(df_series: Series) FloatColumn

Update the column profile.

Parameters:

df_series (pandas.core.series.Series) – df series

Returns:

updated FloatColumn

Return type:

FloatColumn

property variance: float | np.float64

Return variance.

match_count: int
sample_size: int
name: str | None
metadata: dict
times: dict
thread_safe: bool