dataprofiler.profilers.column_profile_compilers module¶
For generating a report.
- class dataprofiler.profilers.column_profile_compilers.BaseCompiler(df_series: Series | None = None, options: StructuredOptions | None = None, pool: Pool | None = None)¶
Bases:
Generic
[BaseCompilerT
]Abstract class for generating a report.
Initialize BaseCompiler object.
- abstract report(remove_disabled_flag: bool = False) dict ¶
Return report.
- Parameters:
remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in report.
- property profile: dict¶
Return the profile of the column.
- diff(other: BaseCompilerT, options: dict | None = None) dict ¶
Find the difference between 2 compilers and returns the report.
- Parameters:
other (BaseCompiler) – profile compiler finding the difference with this one.
- Returns:
difference of the profiles
- Return type:
dict
- update_profile(df_series: Series, pool: Pool = None) BaseCompiler | None ¶
Update the profiles from the data frames.
- Parameters:
df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing
- Returns:
Self
- Return type:
- classmethod load_from_dict(data, config: dict | None = None) BaseCompiler ¶
Parse attribute from json dictionary into self.
- Parameters:
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config for loading column profiler params from dictionary
- Returns:
Compiler with attributes populated.
- Return type:
- class dataprofiler.profilers.column_profile_compilers.ColumnPrimitiveTypeProfileCompiler(df_series: Series | None = None, options: StructuredOptions | None = None, pool: Pool | None = None)¶
Bases:
BaseCompiler
[ColumnPrimitiveTypeProfileCompiler
]For generating ordered column profile reports.
Initialize BaseCompiler object.
- report(remove_disabled_flag: bool = False) dict ¶
Return report.
- Parameters:
remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in report.
- property profile: dict¶
Return the profile of the column.
- property selected_data_type: str | None¶
Find the selected data_type in a primitive compiler.
- Returns:
name of the selected data type
- Return type:
str
- diff(other: ColumnPrimitiveTypeProfileCompiler, options: dict | None = None) dict ¶
Find the difference between 2 compilers and returns the report.
- Parameters:
other (ColumnPrimitiveTypeProfileCompiler) – profile compiler finding the difference with this one.
- Returns:
difference of the profiles
- Return type:
dict
- classmethod load_from_dict(data, config: dict | None = None) BaseCompiler ¶
Parse attribute from json dictionary into self.
- Parameters:
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config for loading column profiler params from dictionary
- Returns:
Compiler with attributes populated.
- Return type:
- update_profile(df_series: Series, pool: Pool = None) BaseCompiler | None ¶
Update the profiles from the data frames.
- Parameters:
df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing
- Returns:
Self
- Return type:
- class dataprofiler.profilers.column_profile_compilers.ColumnStatsProfileCompiler(df_series: Series | None = None, options: StructuredOptions | None = None, pool: Pool | None = None)¶
Bases:
BaseCompiler
[ColumnStatsProfileCompiler
]For generating OrderColumn and CategoricalColumn reports.
Initialize BaseCompiler object.
- report(remove_disabled_flag: bool = False) dict ¶
Return report.
- Parameters:
remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in report.
- diff(other: ColumnStatsProfileCompiler, options: dict | None = None) dict ¶
Find the difference between 2 compilers and returns the report.
- Parameters:
other (ColumnStatsProfileCompiler) – profile compiler finding the difference with this one.
- Returns:
difference of the profiles
- Return type:
dict
- classmethod load_from_dict(data, config: dict | None = None) BaseCompiler ¶
Parse attribute from json dictionary into self.
- Parameters:
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config for loading column profiler params from dictionary
- Returns:
Compiler with attributes populated.
- Return type:
- property profile: dict¶
Return the profile of the column.
- update_profile(df_series: Series, pool: Pool = None) BaseCompiler | None ¶
Update the profiles from the data frames.
- Parameters:
df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing
- Returns:
Self
- Return type:
- class dataprofiler.profilers.column_profile_compilers.ColumnDataLabelerCompiler(df_series: Series | None = None, options: StructuredOptions | None = None, pool: Pool | None = None)¶
Bases:
BaseCompiler
[ColumnDataLabelerCompiler
]For generating DataLabelerColumn report.
Initialize BaseCompiler object.
- report(remove_disabled_flag: bool = False) dict ¶
Return report.
- Parameters:
remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in report.
- diff(other: ColumnDataLabelerCompiler, options: dict | None = None) dict ¶
Find the difference between 2 compilers and return the report.
- Parameters:
other (ColumnDataLabelerCompiler) – profile compiler finding the difference with this one.
options (dict) – options to change results of the difference
- Returns:
difference of the profiles
- Return type:
dict
- classmethod load_from_dict(data, config: dict | None = None) BaseCompiler ¶
Parse attribute from json dictionary into self.
- Parameters:
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config for loading column profiler params from dictionary
- Returns:
Compiler with attributes populated.
- Return type:
- property profile: dict¶
Return the profile of the column.
- update_profile(df_series: Series, pool: Pool = None) BaseCompiler | None ¶
Update the profiles from the data frames.
- Parameters:
df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing
- Returns:
Self
- Return type:
- class dataprofiler.profilers.column_profile_compilers.UnstructuredCompiler(df_series: Series | None = None, options: StructuredOptions | None = None, pool: Pool | None = None)¶
Bases:
BaseCompiler
[UnstructuredCompiler
]For generating TextProfiler and UnstructuredLabelerProfile reports.
Initialize BaseCompiler object.
- report(remove_disabled_flag: bool = False) dict ¶
Report profile attrs of class and potentially pop val from self.profile.
- diff(other: UnstructuredCompiler, options: dict | None = None) dict ¶
Find the difference between 2 compilers and return the report.
- Parameters:
other (UnstructuredCompiler) – profile compiler finding the difference with this one.
options (dict) – options to impact the results of the diff
- Returns:
difference of the profiles
- Return type:
dict
- classmethod load_from_dict(data, config: dict | None = None) BaseCompiler ¶
Parse attribute from json dictionary into self.
- Parameters:
data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config for loading column profiler params from dictionary
- Returns:
Compiler with attributes populated.
- Return type:
- property profile: dict¶
Return the profile of the column.
- update_profile(df_series: Series, pool: Pool = None) BaseCompiler | None ¶
Update the profiles from the data frames.
- Parameters:
df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing
- Returns:
Self
- Return type: