Base Model¶

Contains abstract classes for labeling data.

class dataprofiler.labelers.base_model.AutoSubRegistrationMeta(clsname: str, bases: tuple, attrs: dict)¶

Bases: abc.ABCMeta

For registering subclasses.

Create auto registration object and return new class.

mro()¶: Return a type’s method resolution order.

register(subclass)¶

Returns the subclass, to allow usage as a class decorator.

class dataprofiler.labelers.base_model.BaseModel(label_mapping: list | dict, parameters: dict)¶

Bases: object

For labeling data.

Initialize Base Model.

Only model and model parameters are stored here. :param label_mapping: label mapping of the model or list of labels to be

converted into the label mapping

Parameters: parameters (dict) – Contains all the appropriate parameters for the model. Must contain num_labels.
Returns: None

requires_zero_mapping: bool = False¶

property label_mapping: dict¶: Return mapping of labels to their encoded values.

property reverse_label_mapping: dict¶

Return reversed order of current labels.

Useful for when needed to extract Labels via indices.

property labels: list¶

Retrieve the label.

Returns: list of labels

property num_labels: int¶: Return max label mapping.

classmethod get_class(class_name: str) → type[BaseModel] | None¶: Get subclasses.

get_parameters(param_list: list[str] | None = None) → dict¶

Return a dict of parameters from the model given a list.

Parameters: param_list (List[str]) – list of parameters to retrieve from the model.
Returns: dict of parameters

set_params(**kwargs: Any) → None¶: Set the parameters if they exist given kwargs.

add_label(label: str, same_as: str | None = None) → None¶

Add a label to the data labeler.

Parameters

label (str) – new label being added to the data labeler
same_as (str) – label to have the same encoding index as for multi-label to single encoding index.

Returns

None

set_label_mapping(label_mapping: list[str] | dict[str, int]) → None¶

Set the labels for the model.

Parameters: label_mapping (Union[list, dict]) – label mapping of the model or list of labels to be converted into the label mapping
Returns: None

classmethod help() → None¶

Help describe alterable parameters.

Returns: None

abstract reset_weights() → None¶

Reset the weights of the model.

Returns: None

abstract predict(data: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], batch_size: int, show_confidences: bool, verbose: bool) → dict¶

Predict the data with the current model.

Parameters

data (iterator of data to process) – model input data to predict on
batch_size (int) – number of samples in the batch of data
show_confidences (bool) – whether user wants prediction confidences
verbose (bool) – Flag to determine whether to print status or not

Returns

char level predictions and confidences

Return type

dict

abstract classmethod load_from_disk(dirpath: str) → dataprofiler.labelers.base_model.BaseModel¶

Load whole model from disk with weights.

Parameters: dirpath (str) – directory path where you want to load the model from
Returns: loaded model
Return type: BaseModel

abstract save_to_disk(dirpath: str) → None¶

Save whole model to disk with weights.

Parameters: dirpath (str) – directory path where you want to save the model to
Returns: None

class dataprofiler.labelers.base_model.BaseTrainableModel(label_mapping: list | dict, parameters: dict)¶

Bases: dataprofiler.labelers.base_model.BaseModel

Contains abstract method for training models.

Initialize Base Model.

Only model and model parameters are stored here. :param label_mapping: label mapping of the model or list of labels to be

converted into the label mapping

Parameters: parameters (dict) – Contains all the appropriate parameters for the model. Must contain num_labels.
Returns: None

abstract fit(train_data: DataArray, val_data: DataArray, batch_size: int | None = None, epochs: int | None = None, label_mapping: dict[str, int] | None = None, reset_weights: bool = False, verbose: bool = True) → tuple[dict, float | None, dict]¶

Train the current model with the training data and validation data.

Parameters

train_data (Union[pd.DataFrame, pd.Series, np.ndarray]) – Training data used to train model
val_data (Union[pd.DataFrame, pd.Series, np.ndarray]) – Validation data used to validate the training
batch_size (int) – Used to determine number of samples in each batch
epochs (int) – Used to determine how many epochs to run
label_mapping (dict) – Mapping of the labels
reset_weights (bool) – Flag to determine whether or not to reset the model’s weights

Returns

history, f1, f1_report

Return type

Tuple[dict, float, dict]

add_label(label: str, same_as: str | None = None) → None¶

Add a label to the data labeler.

Parameters

label (str) – new label being added to the data labeler
same_as (str) – label to have the same encoding index as for multi-label to single encoding index.

Returns

None

classmethod get_class(class_name: str) → type[BaseModel] | None¶: Get subclasses.

get_parameters(param_list: list[str] | None = None) → dict¶

Return a dict of parameters from the model given a list.

Parameters: param_list (List[str]) – list of parameters to retrieve from the model.
Returns: dict of parameters

classmethod help() → None¶

Help describe alterable parameters.

Returns: None

property label_mapping: dict¶: Return mapping of labels to their encoded values.

property labels: list¶

Retrieve the label.

Returns: list of labels

abstract classmethod load_from_disk(dirpath: str) → dataprofiler.labelers.base_model.BaseModel¶

Load whole model from disk with weights.

Parameters: dirpath (str) – directory path where you want to load the model from
Returns: loaded model
Return type: BaseModel

property num_labels: int¶: Return max label mapping.

abstract predict(data: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], batch_size: int, show_confidences: bool, verbose: bool) → dict¶

Predict the data with the current model.

Parameters

data (iterator of data to process) – model input data to predict on
batch_size (int) – number of samples in the batch of data
show_confidences (bool) – whether user wants prediction confidences
verbose (bool) – Flag to determine whether to print status or not

Returns

char level predictions and confidences

Return type

dict

requires_zero_mapping: bool = False¶

abstract reset_weights() → None¶

Reset the weights of the model.

Returns: None

property reverse_label_mapping: dict¶

Return reversed order of current labels.

Useful for when needed to extract Labels via indices.

abstract save_to_disk(dirpath: str) → None¶

Save whole model to disk with weights.

Parameters: dirpath (str) – directory path where you want to save the model to
Returns: None

set_label_mapping(label_mapping: list[str] | dict[str, int]) → None¶

Set the labels for the model.

Parameters: label_mapping (Union[list, dict]) – label mapping of the model or list of labels to be converted into the label mapping
Returns: None

set_params(**kwargs: Any) → None¶: Set the parameters if they exist given kwargs.