dataprofiler.profilers.column_profile_compilers module¶

For generating a report.

class dataprofiler.profilers.column_profile_compilers.BaseCompiler(df_series: Series | None = None, options: StructuredOptions | None = None, pool: Pool | None = None)¶

Bases: Generic[BaseCompilerT]

Abstract class for generating a report.

Initialize BaseCompiler object.

abstract report(remove_disabled_flag: bool = False) → dict¶

Return report.

Parameters:: remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in report.

property profile: dict¶: Return the profile of the column.

diff(other: BaseCompilerT, options: dict | None = None) → dict¶

Find the difference between 2 compilers and returns the report.

Parameters:: other (BaseCompiler) – profile compiler finding the difference with this one.
Returns:: difference of the profiles
Return type:: dict

update_profile(df_series: Series, pool: Pool = None) → BaseCompiler | None¶

Update the profiles from the data frames.

Parameters:

df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing

Returns:

Self

Return type:

BaseCompiler

classmethod load_from_dict(data, config: dict | None = None) → BaseCompiler¶

Parse attribute from json dictionary into self.

Parameters:

data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config for loading column profiler params from dictionary

Returns:

Compiler with attributes populated.

Return type:

BaseCompiler

class dataprofiler.profilers.column_profile_compilers.ColumnPrimitiveTypeProfileCompiler(df_series: Series | None = None, options: StructuredOptions | None = None, pool: Pool | None = None)¶

Bases: BaseCompiler[ColumnPrimitiveTypeProfileCompiler]

For generating ordered column profile reports.

Initialize BaseCompiler object.

report(remove_disabled_flag: bool = False) → dict¶

Return report.

Parameters:: remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in report.

property profile: dict¶: Return the profile of the column.

property selected_data_type: str | None¶

Find the selected data_type in a primitive compiler.

Returns:: name of the selected data type
Return type:: str

diff(other: ColumnPrimitiveTypeProfileCompiler, options: dict | None = None) → dict¶

Find the difference between 2 compilers and returns the report.

Parameters:: other (ColumnPrimitiveTypeProfileCompiler) – profile compiler finding the difference with this one.
Returns:: difference of the profiles
Return type:: dict

classmethod load_from_dict(data, config: dict | None = None) → BaseCompiler¶

Parse attribute from json dictionary into self.

Parameters:

data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config for loading column profiler params from dictionary

Returns:

Compiler with attributes populated.

Return type:

BaseCompiler

update_profile(df_series: Series, pool: Pool = None) → BaseCompiler | None¶

Update the profiles from the data frames.

Parameters:

df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing

Returns:

Self

Return type:

BaseCompiler

class dataprofiler.profilers.column_profile_compilers.ColumnStatsProfileCompiler(df_series: Series | None = None, options: StructuredOptions | None = None, pool: Pool | None = None)¶

Bases: BaseCompiler[ColumnStatsProfileCompiler]

For generating OrderColumn and CategoricalColumn reports.

Initialize BaseCompiler object.

report(remove_disabled_flag: bool = False) → dict¶

Return report.

Parameters:: remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in report.

diff(other: ColumnStatsProfileCompiler, options: dict | None = None) → dict¶

Find the difference between 2 compilers and returns the report.

Parameters:: other (ColumnStatsProfileCompiler) – profile compiler finding the difference with this one.
Returns:: difference of the profiles
Return type:: dict

classmethod load_from_dict(data, config: dict | None = None) → BaseCompiler¶

Parse attribute from json dictionary into self.

Parameters:

data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config for loading column profiler params from dictionary

Returns:

Compiler with attributes populated.

Return type:

BaseCompiler

property profile: dict¶: Return the profile of the column.

update_profile(df_series: Series, pool: Pool = None) → BaseCompiler | None¶

Update the profiles from the data frames.

Parameters:

df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing

Returns:

Self

Return type:

BaseCompiler

class dataprofiler.profilers.column_profile_compilers.ColumnDataLabelerCompiler(df_series: Series | None = None, options: StructuredOptions | None = None, pool: Pool | None = None)¶

Bases: BaseCompiler[ColumnDataLabelerCompiler]

For generating DataLabelerColumn report.

Initialize BaseCompiler object.

report(remove_disabled_flag: bool = False) → dict¶

Return report.

Parameters:: remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in report.

diff(other: ColumnDataLabelerCompiler, options: dict | None = None) → dict¶

Find the difference between 2 compilers and return the report.

Parameters:

other (ColumnDataLabelerCompiler) – profile compiler finding the difference with this one.
options (dict) – options to change results of the difference

Returns:

difference of the profiles

Return type:

dict

classmethod load_from_dict(data, config: dict | None = None) → BaseCompiler¶

Parse attribute from json dictionary into self.

Parameters:

data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config for loading column profiler params from dictionary

Returns:

Compiler with attributes populated.

Return type:

BaseCompiler

property profile: dict¶: Return the profile of the column.

update_profile(df_series: Series, pool: Pool = None) → BaseCompiler | None¶

Update the profiles from the data frames.

Parameters:

df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing

Returns:

Self

Return type:

BaseCompiler

class dataprofiler.profilers.column_profile_compilers.UnstructuredCompiler(df_series: Series | None = None, options: StructuredOptions | None = None, pool: Pool | None = None)¶

Bases: BaseCompiler[UnstructuredCompiler]

For generating TextProfiler and UnstructuredLabelerProfile reports.

Initialize BaseCompiler object.

report(remove_disabled_flag: bool = False) → dict¶: Report profile attrs of class and potentially pop val from self.profile.

diff(other: UnstructuredCompiler, options: dict | None = None) → dict¶

Find the difference between 2 compilers and return the report.

Parameters:

other (UnstructuredCompiler) – profile compiler finding the difference with this one.
options (dict) – options to impact the results of the diff

Returns:

difference of the profiles

Return type:

dict

classmethod load_from_dict(data, config: dict | None = None) → BaseCompiler¶

Parse attribute from json dictionary into self.

Parameters:

data (dict[string, Any]) – dictionary with attributes and values.
config (Dict | None) – config for loading column profiler params from dictionary

Returns:

Compiler with attributes populated.

Return type:

BaseCompiler

property profile: dict¶: Return the profile of the column.

update_profile(df_series: Series, pool: Pool = None) → BaseCompiler | None¶

Update the profiles from the data frames.

Parameters:

df_series (pandas.core.series.Series) – a given column, assume df_series in str
pool (multiprocessing.Pool) – pool to utilized for multiprocessing

Returns:

Self

Return type:

BaseCompiler