Text Column Profile¶
Text profile analysis for individual col within structured profiling..
- class dataprofiler.profilers.text_column_profile.TextColumn(name: Optional[str], options: Optional[dataprofiler.profilers.profiler_options.TextOptions] = None)¶
Bases:
dataprofiler.profilers.numerical_column_stats.NumericStatsMixin
,dataprofiler.profilers.base_column_profilers.BaseColumnPrimitiveTypeProfiler
Text column profile subclass of BaseColumnProfiler.
Represents a column in the dataset which is a text column.
Initialize column base properties and itself.
- Parameters
name (String) – Name of the data
options (TextOptions) – Options for the Text column
- type: Optional[str] = 'text'¶
- report(remove_disabled_flag: bool = False) Dict ¶
Report profile attribute of class; potentially pop val from self.profile.
- property profile: Dict¶
Return the profile of the column.
- Returns
- diff(other_profile: dataprofiler.profilers.text_column_profile.TextColumn, options: Optional[Dict] = None) Dict ¶
Find the differences for text columns.
- Parameters
other_profile (TextColumn Profile) – profile to find the difference with
- Returns
the text columns differences
- Return type
dict
- property data_type_ratio: Optional[float]¶
Calculate the ratio of samples which match this data type.
NOTE: all values can be considered string so always returns 1 in this case.
- Returns
ratio of data type
- Return type
float
- update(df_series: pandas.core.series.Series) dataprofiler.profilers.text_column_profile.TextColumn ¶
Update the column profile.
- Parameters
df_series (pandas.core.series.Series) – df series
- Returns
updated TextColumn
- Return type
- col_type = None¶
- static is_float(x: str) bool ¶
Return True if x is float.
For “0.80” this function returns True For “1.00” this function returns True For “1” this function returns True
- Parameters
x (str) – string to test
- Returns
if is float or not
- Return type
bool
- static is_int(x: str) bool ¶
Return True if x is integer.
For “0.80” This function returns False For “1.00” This function returns True For “1” this function returns True
- Parameters
x (str) – string to test
- Returns
if is integer or not
- Return type
bool
- property kurtosis: float¶
Return kurtosis value.
- property mean: float¶
Return mean value.
- property median: float¶
Estimate the median of the data.
- Returns
the median
- Return type
float
- property median_abs_deviation: float¶
Get median absolute deviation estimated from the histogram of the data.
Subtract bin edges from the median value Fold the histogram to positive and negative parts around zero Impose the two bin edges from the two histogram Calculate the counts for the two histograms with the imposed bin edges Superimpose the counts from the two histograms Interpolate the median absolute deviation from the superimposed counts
- Returns
median absolute deviation
- property mode: List[float]¶
Find an estimate for the mode[s] of the data.
- Returns
the mode(s) of the data
- Return type
list(float)
- static np_type_to_type(val: Any) Union[int, float] ¶
Convert numpy variables to base python type variables.
- Parameters
val (numpy type or base type) – value to check & change
- Return val
base python type
- Rtype val
int or float
- property skewness: float¶
Return skewness value.
- property stddev: float¶
Return stddev value.
- property variance: float¶
Return variance.
- min: Optional[Union[int, float]]¶
- max: Optional[Union[int, float]]¶
- sum: Union[int, float]¶
- max_histogram_bin: int¶
- min_histogram_bin: int¶
- histogram_bin_method_names: List[str]¶
- histogram_selection: Optional[str]¶
- user_set_histogram_bin: Optional[int]¶
- bias_correction: bool¶
- num_zeros: int¶
- num_negatives: int¶
- histogram_methods: Dict¶
- quantiles: Union[List[float], Dict]¶
- name: Optional[str]¶
- sample_size: int¶
- metadata: Dict¶
- times: Dict¶
- thread_safe: bool¶
- match_count: int¶