dataprofiler.labelers.char_load_tf_model module¶

Contains class for training data labeler model.

class dataprofiler.labelers.char_load_tf_model.CharLoadTFModel(model_path: str, label_mapping: dict[str, int], parameters: dict | None = None)¶

Bases: BaseTrainableModel

For training data labeler model.

Initialize Loadable TF Model.

Parameters:

model_path (str) – path to model to load
label_mapping (dict) – maps labels to their encoded integers
parameters (dict) –
Contains all the appropriate parameters for the model. Must contain num_labels. Other possible parameters are:

max_length, max_char_encoding_id, dim_embed, size_fc dropout, size_conv, num_fil, optimizer, default_label

Returns:

None

requires_zero_mapping: bool = False¶

set_label_mapping(label_mapping: list[str] | dict[str, int]) → None¶

Set the labels for the model.

Parameters:: label_mapping (dict) – label mapping of the model
Returns:: None

save_to_disk(dirpath: str) → None¶

Save whole model to disk with weights.

Parameters:: dirpath (str) – directory path where you want to save the model to
Returns:: None

classmethod load_from_disk(dirpath: str) → CharLoadTFModel¶

Load whole model from disk with weights.

Parameters:: dirpath (str) – directory path where you want to load the model from
Returns:: loaded CharLoadTFModel
Return type:: CharLoadTFModel

reset_weights() → None¶

Reset the weights of the model.

Returns:: None

fit(train_data: DataArray, val_data: DataArray = None, batch_size: int = None, epochs: int = None, label_mapping: dict[str, int] = None, reset_weights: bool = False, verbose: bool = True) → tuple[dict, float | None, dict]¶

Train the current model with the training data and validation data.

Parameters:

train_data (Union[list, np.ndarray]) – Training data used to train model
val_data (Union[list, np.ndarray]) – Validation data used to validate the training
batch_size (int) – Used to determine number of samples in each batch
label_mapping (Union[dict, None]) – maps labels to their encoded integers
reset_weights (bool) – Flag to determine whether to reset the weights or not
verbose (bool) – Flag to determine whether to print status or not

Returns:

history, f1, f1_report

Return type:

Tuple[dict, float, dict]

predict(data: DataFrame | Series | ndarray, batch_size: int = 32, show_confidences: bool = False, verbose: bool = True) → dict¶

Run model and get predictions.

Parameters:

data (Union[list, numpy.ndarray]) – text input
batch_size (int) – number of samples in the batch of data
show_confidences – whether user wants prediction confidences
verbose (bool) – Flag to determine whether to print status or not

Returns:

char level predictions and confidences

Return type:

dict

details() → None¶

Print the relevant details of the model.

Details include summary, parameters, label mapping.

add_label(label: str, same_as: str | None = None) → None¶

Add a label to the data labeler.

Parameters:

label (str) – new label being added to the data labeler
same_as (str) – label to have the same encoding index as for multi-label to single encoding index.

Returns:

None

classmethod get_class(class_name: str) → type[BaseModel] | None¶: Get subclasses.

get_parameters(param_list: list[str] | None = None) → dict¶

Return a dict of parameters from the model given a list.

Parameters:: param_list (List[str]) – list of parameters to retrieve from the model.
Returns:: dict of parameters

classmethod help() → None¶

Help describe alterable parameters.

Returns:: None

property label_mapping: dict[str, int]¶: Return mapping of labels to their encoded values.

property labels: list[str]¶

Retrieve the label.

Returns:: list of labels

property num_labels: int¶: Return max label mapping.

property reverse_label_mapping: dict[int, str]¶

Return reversed order of current labels.

Useful for when needed to extract Labels via indices.

set_params(**kwargs: Any) → None¶: Set the parameters if they exist given kwargs.