Regex Model

class dataprofiler.labelers.regex_model.RegexModel(label_mapping=None, parameters=None)

Bases: dataprofiler.labelers.base_model.BaseModel

Regex Model Initializer.

Example regex_patterns:
regex_patterns = {
“LABEL_1”: [

“LABEL_1_pattern_1”, “LABEL_1_pattern_2”, …

], “LABEL_2”: [

“LABEL_2_pattern_1”, “LABEL_2_pattern_2”, …

}

Example encapsulators:
encapsulators = {

‘start’: r’(?<![w.$%-])’, ‘end’: r’(?:(?=(|[ ]))|(?=[^w%$]([^w]|$))|$)’,

}

Parameters
  • label_mapping (dict) – maps labels to their encoded integers

  • parameters (dict) –

    Contains all the appropriate parameters for the model. Possible parameters are:

    max_length, max_num_chars, dim_embed

Returns

None

reset_weights()

Reset the weights of the model.

Returns

None

predict(data, batch_size=None, show_confidences=False, verbose=True)

Applies the regex patterns (within regex_model) to the input_string, create predictions for all matching patterns. Each pattern has an associated entity and the predictions of each character within the string are given a True or False identification for each entity. All characters not identified by ANY of the regex patterns in the pattern_dict are considered background characters, and are replaced with the default_label value.

Parameters
  • data (iterator) – list of strings to predict upon

  • batch_size (N/A) – does not impact this model and should be fixed to not be required.

  • show_confidences – whether user wants prediction confidences

  • verbose (bool) – Flag to determine whether to print status or not

Returns

char level predictions and confidences

Return type

dict

classmethod load_from_disk(dirpath)

Loads whole model from disk with weights

Parameters

dirpath (str) – directory path where you want to load the model from

Returns

None

save_to_disk(dirpath)

Saves whole model to disk with weights.

Parameters

dirpath (str) – directory path where you want to save the model to

Returns

None

add_label(label, same_as=None)

Adds a label to the data labeler.

Parameters
  • label (str) – new label being added to the data labeler

  • same_as (str) – label to have the same encoding index as for multi-label to single encoding index.

Returns

None

classmethod get_class(class_name)
get_parameters(param_list=None)

Returns a dict of parameters from the model given a list. :param param_list: list of parameters to retrieve from the model. :type param_list: list :return: dict of parameters

classmethod help()

Help function describing alterable parameters.

Returns

None

property label_mapping

mapping of labels to their encoded values

Type

return

property labels

Retrieves the label :return: list of labels

property num_labels
requires_zero_mapping = False
property reverse_label_mapping

Reversed order of current labels, useful for when needed to extract Labels via indices

Type

return

set_label_mapping(label_mapping)

Sets the labels for the model

Parameters

label_mapping (Union[list, dict]) – label mapping of the model or list of labels to be converted into the label mapping

Returns

None

set_params(**kwargs)

Given kwargs, set the parameters if they exist.