Character Level Cnn Model¶
Contains classes for char data labeling.
- dataprofiler.labelers.character_level_cnn_model.build_embd_dictionary(filename: str) dict[str, np.ndarray] ¶
Return a numpy embedding dictionary from embed file with GloVe-like format.
- Parameters
filename (str) – Path to the embed file for loading
- dataprofiler.labelers.character_level_cnn_model.create_glove_char(n_dims: int, source_file: Optional[str] = None) None ¶
Embed GloVe chars embeddings from source file to n_dims principal components.
Embed in a new file.
- Parameters
n_dims (int) – Final number of principal component dims of the embeddings
source_file (str) – Location of original embeddings to factor down
- class dataprofiler.labelers.character_level_cnn_model.CharacterLevelCnnModel(label_mapping: dict[str, int], parameters: dict = None)¶
Bases:
dataprofiler.labelers.base_model.BaseTrainableModel
Class for training char data labeler.
Initialize CNN Model.
Initialize epoch_id.
- Parameters
label_mapping (dict) – maps labels to their encoded integers
parameters (dict) –
Contains all the appropriate parameters for the model. Must contain num_labels. Other possible parameters are:
max_length, max_char_encoding_id, dim_embed, size_fc dropout, size_conv, num_fil, optimizer, default_label
- Returns
None
- requires_zero_mapping: bool = True¶
- set_label_mapping(label_mapping: list[str] | dict[str, int]) None ¶
Set the labels for the model.
- Parameters
label_mapping (dict) – label mapping of the model
- Returns
None
- save_to_disk(dirpath: str) None ¶
Save whole model to disk with weights.
- Parameters
dirpath (str) – directory path where you want to save the model to
- Returns
None
- classmethod load_from_disk(dirpath: str) dataprofiler.labelers.character_level_cnn_model.CharacterLevelCnnModel ¶
Load whole model from disk with weights.
- Parameters
dirpath (str) – directory path where you want to load the model from
- Returns
None
- reset_weights() None ¶
Reset the weights of the model.
- Returns
None
- fit(train_data: DataArray, val_data: DataArray | None = None, batch_size: int = None, epochs: int = None, label_mapping: dict[str, int] = None, reset_weights: bool = False, verbose: bool = True) tuple[dict, float | None, dict] ¶
Train the current model with the training data and validation data.
- Parameters
train_data (Union[list, np.ndarray]) – Training data used to train model
val_data (Union[list, np.ndarray]) – Validation data used to validate the training
batch_size (int) – Used to determine number of samples in each batch
label_mapping (Union[dict, None]) – maps labels to their encoded integers
reset_weights (bool) – Flag to determine whether to reset the weights or not
verbose (bool) – Flag to determine whether to print status or not
- Returns
history, f1, f1_report
- Return type
Tuple[dict, float, dict]
- predict(data: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], batch_size: int = 32, show_confidences: bool = False, verbose: bool = True) dict ¶
Run model and get predictions.
- Parameters
data (Union[list, numpy.ndarray]) – text input
batch_size (int) – number of samples in the batch of data
show_confidences – whether user wants prediction confidences
verbose (bool) – Flag to determine whether to print status or not
- Returns
char level predictions and confidences
- Return type
dict
- details() None ¶
Print the relevant details of the model.
Details include summary, parameters, and label mapping.
- add_label(label: str, same_as: Optional[str] = None) None ¶
Add a label to the data labeler.
- Parameters
label (str) – new label being added to the data labeler
same_as (str) – label to have the same encoding index as for multi-label to single encoding index.
- Returns
None
- classmethod get_class(class_name: str) type[BaseModel] | None ¶
Get subclasses.
- get_parameters(param_list: list[str] = None) dict ¶
Return a dict of parameters from the model given a list.
- Parameters
param_list (List[str]) – list of parameters to retrieve from the model.
- Returns
dict of parameters
- classmethod help() None ¶
Help describe alterable parameters.
- Returns
None
- property label_mapping: dict[str, int]¶
Return mapping of labels to their encoded values.
- property labels: list[str]¶
Retrieve the label.
- Returns
list of labels
- property num_labels: int¶
Return max label mapping.
- property reverse_label_mapping: dict[int, str]¶
Return reversed order of current labels.
Useful for when needed to extract Labels via indices.
- set_params(**kwargs: Any) None ¶
Set the parameters if they exist given kwargs.