Column Profile Compilers

For generating a report.

class dataprofiler.profilers.column_profile_compilers.BaseCompiler(df_series: Optional[pandas.core.series.Series] = None, options: Optional[dataprofiler.profilers.profiler_options.StructuredOptions] = None, pool: Optional[multiprocessing.pool.Pool] = None)

Bases: object

Abstract class for generating a report.

Initialize BaseCompiler object.

abstract report(remove_disabled_flag: bool = False) dict

Return report.

Parameters

remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in report.

property profile: dict

Return the profile of the column.

diff(other: dataprofiler.profilers.column_profile_compilers.BaseCompiler, options: Optional[dict] = None) dict

Find the difference between 2 compilers and returns the report.

Parameters

other (BaseCompiler) – profile compiler finding the difference with this one.

Returns

difference of the profiles

Return type

dict

update_profile(df_series: Series, pool: Pool = None) BaseCompiler | None

Update the profiles from the data frames.

Parameters
  • df_series (pandas.core.series.Series) – a given column, assume df_series in str

  • pool (multiprocessing.Pool) – pool to utilized for multiprocessing

Returns

Self

Return type

BaseCompiler

class dataprofiler.profilers.column_profile_compilers.ColumnPrimitiveTypeProfileCompiler(df_series: Optional[pandas.core.series.Series] = None, options: Optional[dataprofiler.profilers.profiler_options.StructuredOptions] = None, pool: Optional[multiprocessing.pool.Pool] = None)

Bases: dataprofiler.profilers.column_profile_compilers.BaseCompiler

For generating ordered column profile reports.

Initialize BaseCompiler object.

report(remove_disabled_flag: bool = False) dict

Return report.

Parameters

remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in report.

property profile: dict

Return the profile of the column.

property selected_data_type: str | None

Find the selected data_type in a primitive compiler.

Returns

name of the selected data type

Return type

str

diff(other: dataprofiler.profilers.column_profile_compilers.ColumnPrimitiveTypeProfileCompiler, options: Optional[dict] = None) dict

Find the difference between 2 compilers and returns the report.

Parameters

other (ColumnPrimitiveTypeProfileCompiler) – profile compiler finding the difference with this one.

Returns

difference of the profiles

Return type

dict

update_profile(df_series: Series, pool: Pool = None) BaseCompiler | None

Update the profiles from the data frames.

Parameters
  • df_series (pandas.core.series.Series) – a given column, assume df_series in str

  • pool (multiprocessing.Pool) – pool to utilized for multiprocessing

Returns

Self

Return type

BaseCompiler

class dataprofiler.profilers.column_profile_compilers.ColumnStatsProfileCompiler(df_series: Optional[pandas.core.series.Series] = None, options: Optional[dataprofiler.profilers.profiler_options.StructuredOptions] = None, pool: Optional[multiprocessing.pool.Pool] = None)

Bases: dataprofiler.profilers.column_profile_compilers.BaseCompiler

For generating OrderColumn and CategoricalColumn reports.

Initialize BaseCompiler object.

report(remove_disabled_flag: bool = False) dict

Return report.

Parameters

remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in report.

diff(other: dataprofiler.profilers.column_profile_compilers.ColumnStatsProfileCompiler, options: Optional[dict] = None) dict

Find the difference between 2 compilers and returns the report.

Parameters

other (ColumnStatsProfileCompiler) – profile compiler finding the difference with this one.

Returns

difference of the profiles

Return type

dict

property profile: dict

Return the profile of the column.

update_profile(df_series: Series, pool: Pool = None) BaseCompiler | None

Update the profiles from the data frames.

Parameters
  • df_series (pandas.core.series.Series) – a given column, assume df_series in str

  • pool (multiprocessing.Pool) – pool to utilized for multiprocessing

Returns

Self

Return type

BaseCompiler

class dataprofiler.profilers.column_profile_compilers.ColumnDataLabelerCompiler(df_series: Optional[pandas.core.series.Series] = None, options: Optional[dataprofiler.profilers.profiler_options.StructuredOptions] = None, pool: Optional[multiprocessing.pool.Pool] = None)

Bases: dataprofiler.profilers.column_profile_compilers.BaseCompiler

For generating DataLabelerColumn report.

Initialize BaseCompiler object.

report(remove_disabled_flag: bool = False) dict

Return report.

Parameters

remove_disabled_flag (boolean) – flag to determine if disabled options should be excluded in report.

diff(other: dataprofiler.profilers.column_profile_compilers.ColumnDataLabelerCompiler, options: Optional[dict] = None) dict

Find the difference between 2 compilers and return the report.

Parameters
  • other (ColumnDataLabelerCompiler) – profile compiler finding the difference with this one.

  • options (dict) – options to change results of the difference

Returns

difference of the profiles

Return type

dict

property profile: dict

Return the profile of the column.

update_profile(df_series: Series, pool: Pool = None) BaseCompiler | None

Update the profiles from the data frames.

Parameters
  • df_series (pandas.core.series.Series) – a given column, assume df_series in str

  • pool (multiprocessing.Pool) – pool to utilized for multiprocessing

Returns

Self

Return type

BaseCompiler

class dataprofiler.profilers.column_profile_compilers.UnstructuredCompiler(df_series: Optional[pandas.core.series.Series] = None, options: Optional[dataprofiler.profilers.profiler_options.StructuredOptions] = None, pool: Optional[multiprocessing.pool.Pool] = None)

Bases: dataprofiler.profilers.column_profile_compilers.BaseCompiler

For generating TextProfiler and UnstructuredLabelerProfile reports.

Initialize BaseCompiler object.

report(remove_disabled_flag: bool = False) dict

Report profile attrs of class and potentially pop val from self.profile.

diff(other: dataprofiler.profilers.column_profile_compilers.UnstructuredCompiler, options: Optional[dict] = None) dict

Find the difference between 2 compilers and return the report.

Parameters
  • other (UnstructuredCompiler) – profile compiler finding the difference with this one.

  • options (dict) – options to impact the results of the diff

Returns

difference of the profiles

Return type

dict

property profile: dict

Return the profile of the column.

update_profile(df_series: Series, pool: Pool = None) BaseCompiler | None

Update the profiles from the data frames.

Parameters
  • df_series (pandas.core.series.Series) – a given column, assume df_series in str

  • pool (multiprocessing.Pool) – pool to utilized for multiprocessing

Returns

Self

Return type

BaseCompiler