Base Model

Contains abstract classes for labeling data.

class dataprofiler.labelers.base_model.AutoSubRegistrationMeta(clsname, bases, attrs)

Bases: abc.ABCMeta

For registering subclasses.

Create auto registration object and return new class.

mro()

Return a type’s method resolution order.

register(subclass)

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

class dataprofiler.labelers.base_model.BaseModel(label_mapping, parameters)

Bases: object

For labeling data.

Initialize Base Model.

Only model and model parameters are stored here. :param parameters: Contains all the appropriate parameters for the model.

Must contain num_labels.

Returns

None

requires_zero_mapping = False
property label_mapping

Return mapping of labels to their encoded values.

property reverse_label_mapping

Return reversed order of current labels.

Useful for when needed to extract Labels via indices.

property labels

Retrieve the label.

Returns

list of labels

property num_labels

Return max label mapping.

classmethod get_class(class_name)

Get subclasses.

get_parameters(param_list=None)

Return a dict of parameters from the model given a list.

Parameters

param_list (list) – list of parameters to retrieve from the model.

Returns

dict of parameters

set_params(**kwargs)

Set the parameters if they exist given kwargs.

add_label(label, same_as=None)

Add a label to the data labeler.

Parameters
  • label (str) – new label being added to the data labeler

  • same_as (str) – label to have the same encoding index as for multi-label to single encoding index.

Returns

None

set_label_mapping(label_mapping)

Set the labels for the model.

Parameters

label_mapping (Union[list, dict]) – label mapping of the model or list of labels to be converted into the label mapping

Returns

None

classmethod help()

Help describe alterable parameters.

Returns

None

abstract reset_weights()

Reset the weights of the model.

Returns

None

abstract predict(data, batch_size, show_confidences, verbose)

Predict the data with the current model.

Parameters
  • data (iterator of data to process) – model input data to predict on

  • batch_size (int) – number of samples in the batch of data

  • show_confidences (bool) – whether user wants prediction confidences

  • verbose (bool) – Flag to determine whether to print status or not

Returns

char level predictions and confidences

Return type

dict

abstract classmethod load_from_disk(dirpath)

Load whole model from disk with weights.

Parameters

dirpath (str) – directory path where you want to load the model from

Returns

None

abstract save_to_disk(dirpath)

Save whole model to disk with weights.

Parameters

dirpath (str) – directory path where you want to save the model to

Returns

None

class dataprofiler.labelers.base_model.BaseTrainableModel(label_mapping, parameters)

Bases: dataprofiler.labelers.base_model.BaseModel

Contains abstract method for training models.

Initialize Base Model.

Only model and model parameters are stored here. :param parameters: Contains all the appropriate parameters for the model.

Must contain num_labels.

Returns

None

abstract fit(train_data, val_data, batch_size=32, epochs=1, label_mapping=None, reset_weights=False)

Train the current model with the training data and validation data.

Parameters
  • train_data (Union[pd.DataFrame, pd.Series, np.ndarray]) – Training data used to train model

  • val_data (Union[pd.DataFrame, pd.Series, np.ndarray]) – Validation data used to validate the training

  • batch_size (int) – Used to determine number of samples in each batch

  • epochs (int) – Used to determine how many epochs to run

  • label_mapping (dict) – Mapping of the labels

  • reset_weights (bool) – Flag to determine whether or not to reset the model’s weights

Returns

None

add_label(label, same_as=None)

Add a label to the data labeler.

Parameters
  • label (str) – new label being added to the data labeler

  • same_as (str) – label to have the same encoding index as for multi-label to single encoding index.

Returns

None

classmethod get_class(class_name)

Get subclasses.

get_parameters(param_list=None)

Return a dict of parameters from the model given a list.

Parameters

param_list (list) – list of parameters to retrieve from the model.

Returns

dict of parameters

classmethod help()

Help describe alterable parameters.

Returns

None

property label_mapping

Return mapping of labels to their encoded values.

property labels

Retrieve the label.

Returns

list of labels

abstract classmethod load_from_disk(dirpath)

Load whole model from disk with weights.

Parameters

dirpath (str) – directory path where you want to load the model from

Returns

None

property num_labels

Return max label mapping.

abstract predict(data, batch_size, show_confidences, verbose)

Predict the data with the current model.

Parameters
  • data (iterator of data to process) – model input data to predict on

  • batch_size (int) – number of samples in the batch of data

  • show_confidences (bool) – whether user wants prediction confidences

  • verbose (bool) – Flag to determine whether to print status or not

Returns

char level predictions and confidences

Return type

dict

requires_zero_mapping = False
abstract reset_weights()

Reset the weights of the model.

Returns

None

property reverse_label_mapping

Return reversed order of current labels.

Useful for when needed to extract Labels via indices.

abstract save_to_disk(dirpath)

Save whole model to disk with weights.

Parameters

dirpath (str) – directory path where you want to save the model to

Returns

None

set_label_mapping(label_mapping)

Set the labels for the model.

Parameters

label_mapping (Union[list, dict]) – label mapping of the model or list of labels to be converted into the label mapping

Returns

None

set_params(**kwargs)

Set the parameters if they exist given kwargs.