Data Labelers¶
-
dataprofiler.labelers.data_labelers.
train_structured_labeler
(data, default_label=None, save_dirpath=None, epochs=2)¶ Uses provided data to create and save a structured data labeler
- Parameters
data (Union[None, pd.DataFrame]) – data to be trained upon
save_dirpath (Union[None, str]) – path to save data labeler
epochs (int) – number of epochs to loop training the data
- Returns
-
class
dataprofiler.labelers.data_labelers.
UnstructuredDataLabeler
(dirpath=None, load_options=None)¶ Bases:
dataprofiler.labelers.base_data_labeler.BaseDataLabeler
Initialize DataLabeler class.
- Parameters
dirpath – path to data labeler
load_options – optional arguments to include for load i.e. class for model or processors
-
add_label
(label, same_as=None)¶ Adds a label to the data labeler.
- Parameters
label (str) – new label being added to the data labeler
same_as (str) – label to have the same encoding index as for multi-label to single encoding index.
- Returns
None
-
check_pipeline
(skip_postprocessor=False, error_on_mismatch=False)¶ Checks whether the processors and models connect together without error.
- Parameters
skip_postprocessor (bool) – skip checking postprocessor is valid in pipeline
error_on_mismatch (bool) – if true, errors instead of warns on parameter mismatches in pipeline
- Returns
bool indicating valid pipeline
-
help
()¶ Help function describing alterable parameters, input data formats for preprocessors, and output data formats for postprocessors.
- Returns
None
-
property
label_mapping
¶ Retrieves the label encodings
- Returns
dictionary for associating labels to indexes
-
property
labels
¶ Retrieves the label
- Returns
list of labels
-
classmethod
load_from_disk
(dirpath, load_options=None)¶ Loads the data labeler from a saved location on disk.
- Parameters
dirpath (str) – path to data labeler files.
load_options (dict) – optional arguments to include for load i.e. class for model or processors
- Returns
DataLabeler class
-
classmethod
load_from_library
(name)¶ Loads the data labeler from the data labeler zoo in the library.
- Parameters
name (str) – name of the data labeler.
- Returns
DataLabeler class
-
classmethod
load_with_components
(preprocessor, model, postprocessor)¶ Loads the data labeler from a its set of components.
- Parameters
preprocessor (data_processing.BaseDataPreprocessor) – processor to set as the preprocessor
model (base_model.BaseModel) – model to use within the data labeler
postprocessor (data_processing.BaseDataPostprocessor) – processor to set as the postprocessor
- Returns
-
property
model
¶ Retrieves the data labeler model
- Returns
returns the model instance
-
property
postprocessor
¶ Retrieves the data postprocessor
- Returns
returns the postprocessor instance
-
predict
(data, batch_size=32, predict_options=None, error_on_mismatch=False, verbose=1)¶ Predicts labels of input data based with the data labeler model.
- Parameters
data – data to be predicted upon
batch_size – batch size of prediction
predict_options – optional parameters to allow for predict as a dict, i.e. dict(show_confidences=True)
error_on_mismatch – if true, errors instead of warns on parameter mismatches in pipeline
verbose – Flag to determine whether to print status or not
- Returns
predictions
-
property
preprocessor
¶ Retrieves the data preprocessor
- Returns
returns the preprocessor instance
-
property
reverse_label_mapping
¶ Retrieves the index to label encoding
- Returns
dictionary for associating indexes to labels
-
save_to_disk
(dirpath)¶ Saves the data labeler to the specified location
- Parameters
dirpath (str) – location to save the data labeler.
- Returns
None
-
set_labels
(labels)¶ Sets the labels for the data labeler.
- Parameters
labels (list or dict) – new labels in either encoding list or dict
- Returns
None
-
set_model
(model)¶ Set the model for the data labeler
- Parameters
model (base_model.BaseModel) – model to use within the data labeler
- Returns
None
-
set_params
(params)¶ Allows user to set parameters of pipeline components in the following format:
- params = dict(
preprocessor=dict(…), model=dict(…), postprocessor=dict(…)
)
where the key,values pairs for each pipeline component must match parameters that exist in their components.
- Parameters
params (dict) –
dictionary containing a key for a given pipeline component and its associated value of parameters as such:
dict(preprocessor=dict(…), model=dict(…), postprocessor=dict(…))
- Returns
None
-
set_postprocessor
(data_processor)¶ Set the data postprocessor for the data labeler
- Parameters
data_processor (data_processing.BaseDataPostprocessor) – processor to set as the postprocessor
- Returns
None
-
set_preprocessor
(data_processor)¶ Set the data preprocessor for the data labeler
- Parameters
data_processor (data_processing.BaseDataPreprocessor) – processor to set as the preprocessor
- Returns
None
-
class
dataprofiler.labelers.data_labelers.
StructuredDataLabeler
(dirpath=None, load_options=None)¶ Bases:
dataprofiler.labelers.base_data_labeler.BaseDataLabeler
Initialize DataLabeler class.
- Parameters
dirpath – path to data labeler
load_options – optional arguments to include for load i.e. class for model or processors
-
add_label
(label, same_as=None)¶ Adds a label to the data labeler.
- Parameters
label (str) – new label being added to the data labeler
same_as (str) – label to have the same encoding index as for multi-label to single encoding index.
- Returns
None
-
check_pipeline
(skip_postprocessor=False, error_on_mismatch=False)¶ Checks whether the processors and models connect together without error.
- Parameters
skip_postprocessor (bool) – skip checking postprocessor is valid in pipeline
error_on_mismatch (bool) – if true, errors instead of warns on parameter mismatches in pipeline
- Returns
bool indicating valid pipeline
-
help
()¶ Help function describing alterable parameters, input data formats for preprocessors, and output data formats for postprocessors.
- Returns
None
-
property
label_mapping
¶ Retrieves the label encodings
- Returns
dictionary for associating labels to indexes
-
property
labels
¶ Retrieves the label
- Returns
list of labels
-
classmethod
load_from_disk
(dirpath, load_options=None)¶ Loads the data labeler from a saved location on disk.
- Parameters
dirpath (str) – path to data labeler files.
load_options (dict) – optional arguments to include for load i.e. class for model or processors
- Returns
DataLabeler class
-
classmethod
load_from_library
(name)¶ Loads the data labeler from the data labeler zoo in the library.
- Parameters
name (str) – name of the data labeler.
- Returns
DataLabeler class
-
classmethod
load_with_components
(preprocessor, model, postprocessor)¶ Loads the data labeler from a its set of components.
- Parameters
preprocessor (data_processing.BaseDataPreprocessor) – processor to set as the preprocessor
model (base_model.BaseModel) – model to use within the data labeler
postprocessor (data_processing.BaseDataPostprocessor) – processor to set as the postprocessor
- Returns
-
property
model
¶ Retrieves the data labeler model
- Returns
returns the model instance
-
property
postprocessor
¶ Retrieves the data postprocessor
- Returns
returns the postprocessor instance
-
predict
(data, batch_size=32, predict_options=None, error_on_mismatch=False, verbose=1)¶ Predicts labels of input data based with the data labeler model.
- Parameters
data – data to be predicted upon
batch_size – batch size of prediction
predict_options – optional parameters to allow for predict as a dict, i.e. dict(show_confidences=True)
error_on_mismatch – if true, errors instead of warns on parameter mismatches in pipeline
verbose – Flag to determine whether to print status or not
- Returns
predictions
-
property
preprocessor
¶ Retrieves the data preprocessor
- Returns
returns the preprocessor instance
-
property
reverse_label_mapping
¶ Retrieves the index to label encoding
- Returns
dictionary for associating indexes to labels
-
save_to_disk
(dirpath)¶ Saves the data labeler to the specified location
- Parameters
dirpath (str) – location to save the data labeler.
- Returns
None
-
set_labels
(labels)¶ Sets the labels for the data labeler.
- Parameters
labels (list or dict) – new labels in either encoding list or dict
- Returns
None
-
set_model
(model)¶ Set the model for the data labeler
- Parameters
model (base_model.BaseModel) – model to use within the data labeler
- Returns
None
-
set_params
(params)¶ Allows user to set parameters of pipeline components in the following format:
- params = dict(
preprocessor=dict(…), model=dict(…), postprocessor=dict(…)
)
where the key,values pairs for each pipeline component must match parameters that exist in their components.
- Parameters
params (dict) –
dictionary containing a key for a given pipeline component and its associated value of parameters as such:
dict(preprocessor=dict(…), model=dict(…), postprocessor=dict(…))
- Returns
None
-
set_postprocessor
(data_processor)¶ Set the data postprocessor for the data labeler
- Parameters
data_processor (data_processing.BaseDataPostprocessor) – processor to set as the postprocessor
- Returns
None
-
set_preprocessor
(data_processor)¶ Set the data preprocessor for the data labeler
- Parameters
data_processor (data_processing.BaseDataPreprocessor) – processor to set as the preprocessor
- Returns
None
-
class
dataprofiler.labelers.data_labelers.
DataLabeler
(labeler_type, dirpath=None, load_options=None, trainable=False)¶ Bases:
object
-
labeler_classes
= {'structured': <class 'dataprofiler.labelers.data_labelers.StructuredDataLabeler'>, 'unstructured': <class 'dataprofiler.labelers.data_labelers.UnstructuredDataLabeler'>}¶
-