dataprofiler.profilers.profile_builder module¶
coding=utf-8
Build model for a dataset by identifying type of column along with its respective parameters.
-
class
dataprofiler.profilers.profile_builder.
StructuredDataProfile
(df_series, sample_size=None, min_sample_size=500, sampling_ratio=0.2, min_true_samples=None, options=None)¶ Bases:
object
-
property
profile
¶
-
update_profile
(df_series, sample_size=None, min_true_samples=None)¶
-
get_base_props_and_clean_null_params
(df_series, sample_size, min_true_samples=None)¶ Identify null characters and return them in a dictionary as well as remove any nulls in column.
- Parameters
df_series (pandas.core.series.Series) – a given column
sample_size (int) – Number of samples to use in generating the profile
min_true_samples (int) – Minimum number of samples required for the profiler
- Returns
updated column with null removed and dictionary of null parameters
- Return type
pd.Series, dict
-
property
-
class
dataprofiler.profilers.profile_builder.
Profiler
(data, samples_per_update=None, min_true_samples=None, profiler_options=None)¶ Bases:
object
Instantiate the Profiler class
- Parameters
data (Data class object) – Data to be profiled
samples_per_update (int) – Number of samples to use in generating profile
min_true_samples (int) – Minimum number of samples required for the profiler
profiler_options (ProfilerOptions Object) – Options for the profiler.
- Returns
Profiler
-
property
profile
¶
-
report
(report_options=None)¶
-
update_profile
(data, sample_size=None, min_true_samples=None)¶ Update the profile for data provided. User can specify the sample size to profile the data with. Additionally, the user can specify the minimum number of non-null samples to profile.
- Parameters
data (Union[data_readers.base_data.BaseData, pandas.DataFrame]) – data to be profiled
sample_size (int) – number of samples to profile from the data
min_true_samples – minimum number of non-null samples to profile
:type min_true_samples :return: None