dataprofiler.labelers.character_level_cnn_model module

dataprofiler.labelers.character_level_cnn_model.build_embd_dictionary(filename)

Returns a numpy embedding dictionary from embed file with GloVe-like format

Parameters

filename (str) – Path to the embed file for loading

dataprofiler.labelers.character_level_cnn_model.create_glove_char(n_dims, source_file=None)

Embeds GloVe chars embeddings from source file to n_dims principal components in a new file

Parameters
  • n_dims (int) – Final number of principal component dims of the embeddings

  • source_file (str) – Location of original embeddings to factor down

class dataprofiler.labelers.character_level_cnn_model.NoV1ResourceMessageFilter(name='')

Bases: logging.Filter

Removes TF2 warning for using TF1 model which has resources.

Initialize a filter.

Initialize with the name of the logger which, together with its children, will have its events allowed through the filter. If no name is specified, allow every event.

filter(record)

Determine if the specified record is to be logged.

Returns True if the record should be logged, or False otherwise. If deemed appropriate, the record may be modified in-place.

class dataprofiler.labelers.character_level_cnn_model.CharacterLevelCnnModel(label_mapping=None, parameters=None)

Bases: dataprofiler.labelers.base_model.BaseTrainableModel

CNN Model Initializer. initialize epoch_id

Parameters
  • label_mapping (dict) – maps labels to their encoded integers

  • parameters (dict) –

    Contains all the appropriate parameters for the model. Must contain num_labels. Other possible parameters are:

    max_length, max_char_encoding_id, dim_embed, size_fc dropout, size_conv, num_fil, optimizer, default_label

Returns

None

requires_zero_mapping = True
set_label_mapping(label_mapping)

Sets the labels for the model

Parameters

label_mapping (dict) – label mapping of the model

Returns

None

save_to_disk(dirpath)

Saves whole model to disk with weights

Parameters

dirpath (str) – directory path where you want to save the model to

Returns

None

classmethod load_from_disk(dirpath)

Loads whole model from disk with weights

Parameters

dirpath (str) – directory path where you want to load the model from

Returns

None

reset_weights()

Reset the weights of the model.

Returns

None

fit(train_data, val_data=None, batch_size=32, label_mapping=None, reset_weights=False, verbose=True)

Train the current model with the training data and validation data

Parameters
  • train_data (Union[list, np.ndarray]) – Training data used to train model

  • val_data (Union[list, np.ndarray]) – Validation data used to validate the training

  • batch_size (int) – Used to determine number of samples in each batch

  • label_mapping (Union[dict, None]) – maps labels to their encoded integers

  • reset_weights (bool) – Flag to determine whether to reset the weights or not

  • verbose (bool) – Flag to determine whether to print status or not

Returns

None

predict(data, batch_size=32, show_confidences=False, verbose=True)

Run model and get predictions

Parameters
  • data (Union[list, numpy.ndarray]) – text input

  • batch_size (int) – number of samples in the batch of data

  • show_confidences – whether user wants prediction confidences

  • verbose (bool) – Flag to determine whether to print status or not

Returns

char level predictions and confidences

Return type

dict

details()

Prints the relevant details of the model (summary, parameters, label mapping)

classmethod get_class(class_name)
get_parameters(param_list=None)

Returns a dict of parameters from the model given a list. :param param_list: list of parameters to retrieve from the model. :type param_list: list :return: dict of parameters

classmethod help()

Help function describing alterable parameters.

Returns

None

property label_mapping

mapping of labels to their encoded values

Type

return

property labels

Retrieves the label :return: list of labels

property num_labels
property reverse_label_mapping

Reversed order of current labels, useful for when needed to extract Labels via indices

Type

return

set_params(**kwargs)

Given kwargs, set the parameters if they exist.