Column Profile Compilers¶
For generating a report.
- class dataprofiler.profilers.column_profile_compilers.BaseCompiler(df_series: Optional[pandas.core.series.Series] = None, options: Optional[dataprofiler.profilers.profiler_options.StructuredOptions] = None, pool: Optional[multiprocessing.pool.Pool] = None)¶
Bases:
object
Abstract class for generating a report.
Initialize BaseCompiler object.
- abstract report(remove_disabled_flag: bool = False) Dict ¶
Return report.
- Parameters
remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in report.
- property profile: Dict¶
Return the profile of the column.
- diff(other: dataprofiler.profilers.column_profile_compilers.BaseCompiler, options: Optional[Dict] = None) Dict ¶
Find the difference between 2 compilers and returns the report.
- Parameters
other (BaseCompiler) – profile compiler finding the difference with this one.
- Returns
difference of the profiles
- Return type
dict
- update_profile(df_series: pandas.core.series.Series, pool: Optional[multiprocessing.pool.Pool] = None) Optional[dataprofiler.profilers.column_profile_compilers.BaseCompiler] ¶
Update the profiles from the data frames.
- Parameters
df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing
- Returns
Self
- Return type
- class dataprofiler.profilers.column_profile_compilers.ColumnPrimitiveTypeProfileCompiler(df_series: Optional[pandas.core.series.Series] = None, options: Optional[dataprofiler.profilers.profiler_options.StructuredOptions] = None, pool: Optional[multiprocessing.pool.Pool] = None)¶
Bases:
dataprofiler.profilers.column_profile_compilers.BaseCompiler
For generating ordered column profile reports.
Initialize BaseCompiler object.
- report(remove_disabled_flag: bool = False) Dict ¶
Return report.
- Parameters
remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in report.
- property profile: Dict¶
Return the profile of the column.
- property selected_data_type: Optional[str]¶
Find the selected data_type in a primitive compiler.
- Returns
name of the selected data type
- Return type
str
- diff(other: dataprofiler.profilers.column_profile_compilers.ColumnPrimitiveTypeProfileCompiler, options: Optional[Dict] = None) Dict ¶
Find the difference between 2 compilers and returns the report.
- Parameters
other (ColumnPrimitiveTypeProfileCompiler) – profile compiler finding the difference with this one.
- Returns
difference of the profiles
- Return type
dict
- update_profile(df_series: pandas.core.series.Series, pool: Optional[multiprocessing.pool.Pool] = None) Optional[dataprofiler.profilers.column_profile_compilers.BaseCompiler] ¶
Update the profiles from the data frames.
- Parameters
df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing
- Returns
Self
- Return type
- class dataprofiler.profilers.column_profile_compilers.ColumnStatsProfileCompiler(df_series: Optional[pandas.core.series.Series] = None, options: Optional[dataprofiler.profilers.profiler_options.StructuredOptions] = None, pool: Optional[multiprocessing.pool.Pool] = None)¶
Bases:
dataprofiler.profilers.column_profile_compilers.BaseCompiler
For generating OrderColumn and CategoricalColumn reports.
Initialize BaseCompiler object.
- report(remove_disabled_flag: bool = False) Dict ¶
Return report.
- Parameters
remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in report.
- diff(other: dataprofiler.profilers.column_profile_compilers.ColumnStatsProfileCompiler, options: Optional[Dict] = None) Dict ¶
Find the difference between 2 compilers and returns the report.
- Parameters
other (ColumnStatsProfileCompiler) – profile compiler finding the difference with this one.
- Returns
difference of the profiles
- Return type
dict
- property profile: Dict¶
Return the profile of the column.
- update_profile(df_series: pandas.core.series.Series, pool: Optional[multiprocessing.pool.Pool] = None) Optional[dataprofiler.profilers.column_profile_compilers.BaseCompiler] ¶
Update the profiles from the data frames.
- Parameters
df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing
- Returns
Self
- Return type
- class dataprofiler.profilers.column_profile_compilers.ColumnDataLabelerCompiler(df_series: Optional[pandas.core.series.Series] = None, options: Optional[dataprofiler.profilers.profiler_options.StructuredOptions] = None, pool: Optional[multiprocessing.pool.Pool] = None)¶
Bases:
dataprofiler.profilers.column_profile_compilers.BaseCompiler
For generating DataLabelerColumn report.
Initialize BaseCompiler object.
- report(remove_disabled_flag: bool = False) Dict ¶
Return report.
- Parameters
remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in report.
- diff(other: dataprofiler.profilers.column_profile_compilers.ColumnDataLabelerCompiler, options: Optional[Dict] = None) Dict ¶
Find the difference between 2 compilers and return the report.
- Parameters
other (ColumnDataLabelerCompiler) – profile compiler finding the difference with this one.
options (dict) – options to change results of the difference
- Returns
difference of the profiles
- Return type
dict
- property profile: Dict¶
Return the profile of the column.
- update_profile(df_series: pandas.core.series.Series, pool: Optional[multiprocessing.pool.Pool] = None) Optional[dataprofiler.profilers.column_profile_compilers.BaseCompiler] ¶
Update the profiles from the data frames.
- Parameters
df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing
- Returns
Self
- Return type
- class dataprofiler.profilers.column_profile_compilers.UnstructuredCompiler(df_series: Optional[pandas.core.series.Series] = None, options: Optional[dataprofiler.profilers.profiler_options.StructuredOptions] = None, pool: Optional[multiprocessing.pool.Pool] = None)¶
Bases:
dataprofiler.profilers.column_profile_compilers.BaseCompiler
For generating TextProfiler and UnstructuredLabelerProfile reports.
Initialize BaseCompiler object.
- report(remove_disabled_flag: bool = False) Dict ¶
Report profile attrs of class and potentially pop val from self.profile.
- diff(other: dataprofiler.profilers.column_profile_compilers.UnstructuredCompiler, options: Optional[Dict] = None) Dict ¶
Find the difference between 2 compilers and return the report.
- Parameters
other (UnstructuredCompiler) – profile compiler finding the difference with this one.
options (dict) – options to impact the results of the diff
- Returns
difference of the profiles
- Return type
dict
- property profile: Dict¶
Return the profile of the column.
- update_profile(df_series: pandas.core.series.Series, pool: Optional[multiprocessing.pool.Pool] = None) Optional[dataprofiler.profilers.column_profile_compilers.BaseCompiler] ¶
Update the profiles from the data frames.
- Parameters
df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing
- Returns
Self
- Return type