dataprofiler.profilers.float_column_profile module¶
Float profile analysis for individual col within structured profiling.
- class dataprofiler.profilers.float_column_profile.FloatColumn(name: str | None, options: FloatOptions = None)¶
Bases:
NumericStatsMixin
[FloatColumn
],BaseColumnPrimitiveTypeProfiler
[FloatColumn
]Float column profile mixin with numerical stats.
Represents a column in the dataset which is a float column.
Initialize column base properties and itself.
- Parameters:
name (String) – Name of the data
options (FloatOptions) – Options for the float column
- type: str | None = 'float'¶
- diff(other_profile: FloatColumn, options: dict | None = None) dict ¶
Find the differences for FloatColumns.
- Parameters:
other_profile (FloatColumn) – profile to find the difference with
- Returns:
the FloatColumn differences
- Return type:
dict
- report(remove_disabled_flag: bool = False) dict ¶
Report profile attribute of class; potentially pop val from self.profile.
- classmethod load_from_dict(data, config: dict | None = None)¶
Parse attribute from json dictionary into self.
- Parameters:
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config for loading column profiler params from dictionary
- Returns:
Profiler with attributes populated.
- Return type:
- property profile: dict¶
Return the profile of the column.
- Returns:
- property precision: dict[str, float | None]¶
Report statistics on the significant figures of each element in the data.
- Returns:
Precision statistics
- Return type:
dict
- property data_type_ratio: float | None¶
Calculate the ratio of samples which match this data type.
- Returns:
ratio of data type
- Return type:
float
- col_type = None¶
- static is_float(x: str) bool ¶
Return True if x is float.
For “0.80” this function returns True For “1.00” this function returns True For “1” this function returns True
- Parameters:
x (str) – string to test
- Returns:
if is float or not
- Return type:
bool
- static is_int(x: str) bool ¶
Return True if x is integer.
For “0.80” This function returns False For “1.00” This function returns True For “1” this function returns True
- Parameters:
x (str) – string to test
- Returns:
if is integer or not
- Return type:
bool
- property kurtosis: float | np.float64¶
Return kurtosis value.
- property mean: float | np.float64¶
Return mean value.
- property median: float¶
Estimate the median of the data.
- Returns:
the median
- Return type:
float
- property median_abs_deviation: float | np.float64¶
Get median absolute deviation estimated from the histogram of the data.
Subtract bin edges from the median value Fold the histogram to positive and negative parts around zero Impose the two bin edges from the two histogram Calculate the counts for the two histograms with the imposed bin edges Superimpose the counts from the two histograms Interpolate the median absolute deviation from the superimposed counts
- Returns:
median absolute deviation
- property mode: list[float]¶
Find an estimate for the mode[s] of the data.
- Returns:
the mode(s) of the data
- Return type:
list(float)
- static np_type_to_type(val: Any) Any ¶
Convert numpy variables to base python type variables.
- Parameters:
val (numpy type or base type) – value to check & change
- Return val:
base python type
- Rtype val:
int or float
- property skewness: float | np.float64¶
Return skewness value.
- property stddev: float | np.float64¶
Return stddev value.
- update(df_series: Series) FloatColumn ¶
Update the column profile.
- Parameters:
df_series (pandas.core.series.Series) – df series
- Returns:
updated FloatColumn
- Return type:
- property variance: float | np.float64¶
Return variance.
- match_count: int¶
- sample_size: int¶
- name: str | None¶
- metadata: dict¶
- times: dict¶
- thread_safe: bool¶