Character Level Cnn Model

Contains classes for char data labeling.

dataprofiler.labelers.character_level_cnn_model.build_embd_dictionary(filename: str) dict[str, np.ndarray]

Return a numpy embedding dictionary from embed file with GloVe-like format.

Parameters

filename (str) – Path to the embed file for loading

dataprofiler.labelers.character_level_cnn_model.create_glove_char(n_dims: int, source_file: Optional[str] = None) None

Embed GloVe chars embeddings from source file to n_dims principal components.

Embed in a new file.

Parameters
  • n_dims (int) – Final number of principal component dims of the embeddings

  • source_file (str) – Location of original embeddings to factor down

class dataprofiler.labelers.character_level_cnn_model.CharacterLevelCnnModel(label_mapping: dict[str, int], parameters: dict = None)

Bases: dataprofiler.labelers.base_model.BaseTrainableModel

Class for training char data labeler.

Initialize CNN Model.

Initialize epoch_id.

Parameters
  • label_mapping (dict) – maps labels to their encoded integers

  • parameters (dict) –

    Contains all the appropriate parameters for the model. Must contain num_labels. Other possible parameters are:

    max_length, max_char_encoding_id, dim_embed, size_fc dropout, size_conv, num_fil, optimizer, default_label

Returns

None

requires_zero_mapping: bool = True
set_label_mapping(label_mapping: list[str] | dict[str, int]) None

Set the labels for the model.

Parameters

label_mapping (dict) – label mapping of the model

Returns

None

save_to_disk(dirpath: str) None

Save whole model to disk with weights.

Parameters

dirpath (str) – directory path where you want to save the model to

Returns

None

classmethod load_from_disk(dirpath: str) dataprofiler.labelers.character_level_cnn_model.CharacterLevelCnnModel

Load whole model from disk with weights.

Parameters

dirpath (str) – directory path where you want to load the model from

Returns

None

reset_weights() None

Reset the weights of the model.

Returns

None

fit(train_data: DataArray, val_data: DataArray | None = None, batch_size: int = None, epochs: int = None, label_mapping: dict[str, int] = None, reset_weights: bool = False, verbose: bool = True) tuple[dict, float | None, dict]

Train the current model with the training data and validation data.

Parameters
  • train_data (Union[list, np.ndarray]) – Training data used to train model

  • val_data (Union[list, np.ndarray]) – Validation data used to validate the training

  • batch_size (int) – Used to determine number of samples in each batch

  • label_mapping (Union[dict, None]) – maps labels to their encoded integers

  • reset_weights (bool) – Flag to determine whether to reset the weights or not

  • verbose (bool) – Flag to determine whether to print status or not

Returns

history, f1, f1_report

Return type

Tuple[dict, float, dict]

predict(data: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], batch_size: int = 32, show_confidences: bool = False, verbose: bool = True) dict

Run model and get predictions.

Parameters
  • data (Union[list, numpy.ndarray]) – text input

  • batch_size (int) – number of samples in the batch of data

  • show_confidences – whether user wants prediction confidences

  • verbose (bool) – Flag to determine whether to print status or not

Returns

char level predictions and confidences

Return type

dict

details() None

Print the relevant details of the model.

Details include summary, parameters, and label mapping.

add_label(label: str, same_as: str | None = None) None

Add a label to the data labeler.

Parameters
  • label (str) – new label being added to the data labeler

  • same_as (str) – label to have the same encoding index as for multi-label to single encoding index.

Returns

None

classmethod get_class(class_name: str) type[BaseModel] | None

Get subclasses.

get_parameters(param_list: list[str] | None = None) dict

Return a dict of parameters from the model given a list.

Parameters

param_list (List[str]) – list of parameters to retrieve from the model.

Returns

dict of parameters

classmethod help() None

Help describe alterable parameters.

Returns

None

property label_mapping: dict[str, int]

Return mapping of labels to their encoded values.

property labels: list[str]

Retrieve the label.

Returns

list of labels

property num_labels: int

Return max label mapping.

property reverse_label_mapping: dict[int, str]

Return reversed order of current labels.

Useful for when needed to extract Labels via indices.

set_params(**kwargs: Any) None

Set the parameters if they exist given kwargs.