Column Profile Compilers¶
For generating a report.
- class dataprofiler.profilers.column_profile_compilers.BaseCompiler(df_series: Optional[pandas.core.series.Series] = None, options: Optional[dataprofiler.profilers.profiler_options.StructuredOptions] = None, pool: Optional[multiprocessing.pool.Pool] = None)¶
Bases:
Generic
[dataprofiler.profilers.column_profile_compilers.BaseCompilerT
]Abstract class for generating a report.
Initialize BaseCompiler object.
- abstract report(remove_disabled_flag: bool = False) dict ¶
Return report.
- Parameters
remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in report.
- property profile: dict¶
Return the profile of the column.
- diff(other: dataprofiler.profilers.column_profile_compilers.BaseCompilerT, options: Optional[dict] = None) dict ¶
Find the difference between 2 compilers and returns the report.
- Parameters
other (BaseCompiler) – profile compiler finding the difference with this one.
- Returns
difference of the profiles
- Return type
dict
- update_profile(df_series: Series, pool: Pool = None) BaseCompiler | None ¶
Update the profiles from the data frames.
- Parameters
df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing
- Returns
Self
- Return type
- classmethod load_from_dict(data, config: dict | None = None) BaseCompiler ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config for loading column profiler params from dictionary
- Returns
Compiler with attributes populated.
- Return type
- class dataprofiler.profilers.column_profile_compilers.ColumnPrimitiveTypeProfileCompiler(df_series: Optional[pandas.core.series.Series] = None, options: Optional[dataprofiler.profilers.profiler_options.StructuredOptions] = None, pool: Optional[multiprocessing.pool.Pool] = None)¶
Bases:
dataprofiler.profilers.column_profile_compilers.BaseCompiler
[ColumnPrimitiveTypeProfileCompiler
]For generating ordered column profile reports.
Initialize BaseCompiler object.
- report(remove_disabled_flag: bool = False) dict ¶
Return report.
- Parameters
remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in report.
- property profile: dict¶
Return the profile of the column.
- property selected_data_type: str | None¶
Find the selected data_type in a primitive compiler.
- Returns
name of the selected data type
- Return type
str
- diff(other: dataprofiler.profilers.column_profile_compilers.ColumnPrimitiveTypeProfileCompiler, options: Optional[dict] = None) dict ¶
Find the difference between 2 compilers and returns the report.
- Parameters
other (ColumnPrimitiveTypeProfileCompiler) – profile compiler finding the difference with this one.
- Returns
difference of the profiles
- Return type
dict
- classmethod load_from_dict(data, config: dict | None = None) BaseCompiler ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config for loading column profiler params from dictionary
- Returns
Compiler with attributes populated.
- Return type
- update_profile(df_series: Series, pool: Pool = None) BaseCompiler | None ¶
Update the profiles from the data frames.
- Parameters
df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing
- Returns
Self
- Return type
- class dataprofiler.profilers.column_profile_compilers.ColumnStatsProfileCompiler(df_series: Optional[pandas.core.series.Series] = None, options: Optional[dataprofiler.profilers.profiler_options.StructuredOptions] = None, pool: Optional[multiprocessing.pool.Pool] = None)¶
Bases:
dataprofiler.profilers.column_profile_compilers.BaseCompiler
[ColumnStatsProfileCompiler
]For generating OrderColumn and CategoricalColumn reports.
Initialize BaseCompiler object.
- report(remove_disabled_flag: bool = False) dict ¶
Return report.
- Parameters
remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in report.
- diff(other: dataprofiler.profilers.column_profile_compilers.ColumnStatsProfileCompiler, options: Optional[dict] = None) dict ¶
Find the difference between 2 compilers and returns the report.
- Parameters
other (ColumnStatsProfileCompiler) – profile compiler finding the difference with this one.
- Returns
difference of the profiles
- Return type
dict
- classmethod load_from_dict(data, config: dict | None = None) BaseCompiler ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config for loading column profiler params from dictionary
- Returns
Compiler with attributes populated.
- Return type
- property profile: dict¶
Return the profile of the column.
- update_profile(df_series: Series, pool: Pool = None) BaseCompiler | None ¶
Update the profiles from the data frames.
- Parameters
df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing
- Returns
Self
- Return type
- class dataprofiler.profilers.column_profile_compilers.ColumnDataLabelerCompiler(df_series: Optional[pandas.core.series.Series] = None, options: Optional[dataprofiler.profilers.profiler_options.StructuredOptions] = None, pool: Optional[multiprocessing.pool.Pool] = None)¶
Bases:
dataprofiler.profilers.column_profile_compilers.BaseCompiler
[ColumnDataLabelerCompiler
]For generating DataLabelerColumn report.
Initialize BaseCompiler object.
- report(remove_disabled_flag: bool = False) dict ¶
Return report.
- Parameters
remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in report.
- diff(other: dataprofiler.profilers.column_profile_compilers.ColumnDataLabelerCompiler, options: Optional[dict] = None) dict ¶
Find the difference between 2 compilers and return the report.
- Parameters
other (ColumnDataLabelerCompiler) – profile compiler finding the difference with this one.
options (dict) – options to change results of the difference
- Returns
difference of the profiles
- Return type
dict
- classmethod load_from_dict(data, config: dict | None = None) BaseCompiler ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config for loading column profiler params from dictionary
- Returns
Compiler with attributes populated.
- Return type
- property profile: dict¶
Return the profile of the column.
- update_profile(df_series: Series, pool: Pool = None) BaseCompiler | None ¶
Update the profiles from the data frames.
- Parameters
df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing
- Returns
Self
- Return type
- class dataprofiler.profilers.column_profile_compilers.UnstructuredCompiler(df_series: Optional[pandas.core.series.Series] = None, options: Optional[dataprofiler.profilers.profiler_options.StructuredOptions] = None, pool: Optional[multiprocessing.pool.Pool] = None)¶
Bases:
dataprofiler.profilers.column_profile_compilers.BaseCompiler
[UnstructuredCompiler
]For generating TextProfiler and UnstructuredLabelerProfile reports.
Initialize BaseCompiler object.
- report(remove_disabled_flag: bool = False) dict ¶
Report profile attrs of class and potentially pop val from self.profile.
- diff(other: dataprofiler.profilers.column_profile_compilers.UnstructuredCompiler, options: Optional[dict] = None) dict ¶
Find the difference between 2 compilers and return the report.
- Parameters
other (UnstructuredCompiler) – profile compiler finding the difference with this one.
options (dict) – options to impact the results of the diff
- Returns
difference of the profiles
- Return type
dict
- classmethod load_from_dict(data, config: dict | None = None) BaseCompiler ¶
Parse attribute from json dictionary into self.
- Parameters
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config for loading column profiler params from dictionary
- Returns
Compiler with attributes populated.
- Return type
- property profile: dict¶
Return the profile of the column.
- update_profile(df_series: Series, pool: Pool = None) BaseCompiler | None ¶
Update the profiles from the data frames.
- Parameters
df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing
- Returns
Self
- Return type