Text Column Profile

Text profile analysis for individual col within structured profiling..

class dataprofiler.profilers.text_column_profile.TextColumn(name: str | None, options: TextOptions = None)

Bases: dataprofiler.profilers.numerical_column_stats.NumericStatsMixin, dataprofiler.profilers.base_column_profilers.BaseColumnPrimitiveTypeProfiler

Text column profile subclass of BaseColumnProfiler.

Represents a column in the dataset which is a text column.

Initialize column base properties and itself.

Parameters
  • name (String) – Name of the data

  • options (TextOptions) – Options for the Text column

type: str | None = 'text'
report(remove_disabled_flag: bool = False) dict

Report profile attribute of class; potentially pop val from self.profile.

property profile: dict

Return the profile of the column.

Returns

diff(other_profile: dataprofiler.profilers.text_column_profile.TextColumn, options: Optional[dict] = None) dict

Find the differences for text columns.

Parameters

other_profile (TextColumn Profile) – profile to find the difference with

Returns

the text columns differences

Return type

dict

property data_type_ratio: float | None

Calculate the ratio of samples which match this data type.

NOTE: all values can be considered string so always returns 1 in this case.

Returns

ratio of data type

Return type

float

update(df_series: pandas.core.series.Series) dataprofiler.profilers.text_column_profile.TextColumn

Update the column profile.

Parameters

df_series (pandas.core.series.Series) – df series

Returns

updated TextColumn

Return type

TextColumn

col_type = None
static is_float(x: str) bool

Return True if x is float.

For “0.80” this function returns True For “1.00” this function returns True For “1” this function returns True

Parameters

x (str) – string to test

Returns

if is float or not

Return type

bool

static is_int(x: str) bool

Return True if x is integer.

For “0.80” This function returns False For “1.00” This function returns True For “1” this function returns True

Parameters

x (str) – string to test

Returns

if is integer or not

Return type

bool

property kurtosis: float | np.float64

Return kurtosis value.

property mean: float

Return mean value.

property median: float

Estimate the median of the data.

Returns

the median

Return type

float

property median_abs_deviation: float | np.float64

Get median absolute deviation estimated from the histogram of the data.

Subtract bin edges from the median value Fold the histogram to positive and negative parts around zero Impose the two bin edges from the two histogram Calculate the counts for the two histograms with the imposed bin edges Superimpose the counts from the two histograms Interpolate the median absolute deviation from the superimposed counts

Returns

median absolute deviation

property mode: list

Find an estimate for the mode[s] of the data.

Returns

the mode(s) of the data

Return type

list(float)

static np_type_to_type(val: Any) Any

Convert numpy variables to base python type variables.

Parameters

val (numpy type or base type) – value to check & change

Return val

base python type

Rtype val

int or float

property skewness: float | np.float64

Return skewness value.

property stddev: float | np.float64

Return stddev value.

property variance: float

Return variance.

min: int | float | None
max: int | float | None
sum: int | float
max_histogram_bin: int
min_histogram_bin: int
histogram_bin_method_names: list[str]
histogram_selection: str | None
user_set_histogram_bin: int | None
bias_correction: bool
num_zeros: int
num_negatives: int
histogram_methods: dict
quantiles: list[float] | dict
match_count: int
name: str | None
sample_size: int
metadata: dict
times: dict
thread_safe: bool