Base Data Labeler

class dataprofiler.labelers.base_data_labeler.BaseDataLabeler(dirpath=None, load_options=None)

Bases: object

Initialize DataLabeler class.

Parameters
  • dirpath – path to data labeler

  • load_options – optional arguments to include for load i.e. class for model or processors

help()

Help function describing alterable parameters, input data formats for preprocessors, and output data formats for postprocessors.

Returns

None

property label_mapping

Retrieves the label encodings

Returns

dictionary for associating labels to indexes

property reverse_label_mapping

Retrieves the index to label encoding

Returns

dictionary for associating indexes to labels

property labels

Retrieves the label

Returns

list of labels

property preprocessor

Retrieves the data preprocessor

Returns

returns the preprocessor instance

property model

Retrieves the data labeler model

Returns

returns the model instance

property postprocessor

Retrieves the data postprocessor

Returns

returns the postprocessor instance

set_params(params)

Allows user to set parameters of pipeline components in the following format:

params = dict(

preprocessor=dict(…), model=dict(…), postprocessor=dict(…)

)

where the key,values pairs for each pipeline component must match parameters that exist in their components.

Parameters

params (dict) –

dictionary containing a key for a given pipeline component and its associated value of parameters as such:

dict(preprocessor=dict(…), model=dict(…), postprocessor=dict(…))

Returns

None

add_label(label, same_as=None)

Adds a label to the data labeler.

Parameters
  • label (str) – new label being added to the data labeler

  • same_as (str) – label to have the same encoding index as for multi-label to single encoding index.

Returns

None

set_labels(labels)

Sets the labels for the data labeler.

Parameters

labels (list or dict) – new labels in either encoding list or dict

Returns

None

predict(data, batch_size=32, predict_options=None, error_on_mismatch=False, verbose=1)

Predicts labels of input data based with the data labeler model.

Parameters
  • data – data to be predicted upon

  • batch_size – batch size of prediction

  • predict_options – optional parameters to allow for predict as a dict, i.e. dict(show_confidences=True)

  • error_on_mismatch – if true, errors instead of warns on parameter mismatches in pipeline

  • verbose – Flag to determine whether to print status or not

Returns

predictions

set_preprocessor(data_processor)

Set the data preprocessor for the data labeler

Parameters

data_processor (data_processing.BaseDataPreprocessor) – processor to set as the preprocessor

Returns

None

set_model(model)

Set the model for the data labeler

Parameters

model (base_model.BaseModel) – model to use within the data labeler

Returns

None

set_postprocessor(data_processor)

Set the data postprocessor for the data labeler

Parameters

data_processor (data_processing.BaseDataPostprocessor) – processor to set as the postprocessor

Returns

None

check_pipeline(skip_postprocessor=False, error_on_mismatch=False)

Checks whether the processors and models connect together without error.

Parameters
  • skip_postprocessor (bool) – skip checking postprocessor is valid in pipeline

  • error_on_mismatch (bool) – if true, errors instead of warns on parameter mismatches in pipeline

Returns

bool indicating valid pipeline

classmethod load_from_library(name)

Loads the data labeler from the data labeler zoo in the library.

Parameters

name (str) – name of the data labeler.

Returns

DataLabeler class

classmethod load_from_disk(dirpath, load_options=None)

Loads the data labeler from a saved location on disk.

Parameters
  • dirpath (str) – path to data labeler files.

  • load_options (dict) – optional arguments to include for load i.e. class for model or processors

Returns

DataLabeler class

classmethod load_with_components(preprocessor, model, postprocessor)

Loads the data labeler from a its set of components.

Parameters
Returns

save_to_disk(dirpath)

Saves the data labeler to the specified location

Parameters

dirpath (str) – location to save the data labeler.

Returns

None

class dataprofiler.labelers.base_data_labeler.TrainableDataLabeler(dirpath=None, load_options=None)

Bases: dataprofiler.labelers.base_data_labeler.BaseDataLabeler

Initialize DataLabeler class.

Parameters
  • dirpath – path to data labeler

  • load_options – optional arguments to include for load i.e. class for model or processors

fit(x, y, validation_split=0.2, labels=None, reset_weights=False, batch_size=32, epochs=1, error_on_mismatch=False)

Fits the data labeler model for the dataset.

Parameters
  • x (Union[pd.DataFrame, pd.Series, np.ndarray]) – samples to fit model

  • y (Union[pd.DataFrame, pd.Series, np.ndarray]) – labels associated with the samples to fit model

  • validation_split (float) – split of the data to have as cross-validation data

  • labels (Union[list, dict]) – Encoding or number of labels if refit is needed to new labels

  • reset_weights (bool) – Flag to determine whether or not to reset the weights

  • batch_size (int) – Size of each batch sent to data labeler model

  • epochs (int) – number of epochs to iterate over the dataset and send to the model

  • error_on_mismatch (bool) – if true, errors instead of warns on parameter mismatches in pipeline

Returns

model output

set_model(model)

Set the model for a trainable data labeler. Model must have a train function to be able to be set.

Parameters

model (base_model.BaseModel) – model to use within the data labeler

Returns

None

classmethod load_with_components(preprocessor, model, postprocessor)

Loads the data labeler from a its set of components.

Parameters
Returns

add_label(label, same_as=None)

Adds a label to the data labeler.

Parameters
  • label (str) – new label being added to the data labeler

  • same_as (str) – label to have the same encoding index as for multi-label to single encoding index.

Returns

None

check_pipeline(skip_postprocessor=False, error_on_mismatch=False)

Checks whether the processors and models connect together without error.

Parameters
  • skip_postprocessor (bool) – skip checking postprocessor is valid in pipeline

  • error_on_mismatch (bool) – if true, errors instead of warns on parameter mismatches in pipeline

Returns

bool indicating valid pipeline

help()

Help function describing alterable parameters, input data formats for preprocessors, and output data formats for postprocessors.

Returns

None

property label_mapping

Retrieves the label encodings

Returns

dictionary for associating labels to indexes

property labels

Retrieves the label

Returns

list of labels

classmethod load_from_disk(dirpath, load_options=None)

Loads the data labeler from a saved location on disk.

Parameters
  • dirpath (str) – path to data labeler files.

  • load_options (dict) – optional arguments to include for load i.e. class for model or processors

Returns

DataLabeler class

classmethod load_from_library(name)

Loads the data labeler from the data labeler zoo in the library.

Parameters

name (str) – name of the data labeler.

Returns

DataLabeler class

property model

Retrieves the data labeler model

Returns

returns the model instance

property postprocessor

Retrieves the data postprocessor

Returns

returns the postprocessor instance

predict(data, batch_size=32, predict_options=None, error_on_mismatch=False, verbose=1)

Predicts labels of input data based with the data labeler model.

Parameters
  • data – data to be predicted upon

  • batch_size – batch size of prediction

  • predict_options – optional parameters to allow for predict as a dict, i.e. dict(show_confidences=True)

  • error_on_mismatch – if true, errors instead of warns on parameter mismatches in pipeline

  • verbose – Flag to determine whether to print status or not

Returns

predictions

property preprocessor

Retrieves the data preprocessor

Returns

returns the preprocessor instance

property reverse_label_mapping

Retrieves the index to label encoding

Returns

dictionary for associating indexes to labels

save_to_disk(dirpath)

Saves the data labeler to the specified location

Parameters

dirpath (str) – location to save the data labeler.

Returns

None

set_labels(labels)

Sets the labels for the data labeler.

Parameters

labels (list or dict) – new labels in either encoding list or dict

Returns

None

set_params(params)

Allows user to set parameters of pipeline components in the following format:

params = dict(

preprocessor=dict(…), model=dict(…), postprocessor=dict(…)

)

where the key,values pairs for each pipeline component must match parameters that exist in their components.

Parameters

params (dict) –

dictionary containing a key for a given pipeline component and its associated value of parameters as such:

dict(preprocessor=dict(…), model=dict(…), postprocessor=dict(…))

Returns

None

set_postprocessor(data_processor)

Set the data postprocessor for the data labeler

Parameters

data_processor (data_processing.BaseDataPostprocessor) – processor to set as the postprocessor

Returns

None

set_preprocessor(data_processor)

Set the data preprocessor for the data labeler

Parameters

data_processor (data_processing.BaseDataPreprocessor) – processor to set as the preprocessor

Returns

None