Regex Model¶
-
class
dataprofiler.labelers.regex_model.
RegexModel
(label_mapping=None, parameters=None)¶ Bases:
dataprofiler.labelers.base_model.BaseModel
Regex Model Initializer.
- Example regex_patterns:
- regex_patterns = {
- “LABEL_1”: [
“LABEL_1_pattern_1”, “LABEL_1_pattern_2”, …
], “LABEL_2”: [
“LABEL_2_pattern_1”, “LABEL_2_pattern_2”, …
}
- Example encapsulators:
- encapsulators = {
‘start’: r’(?<![w.$%-])’, ‘end’: r’(?:(?=(|[ ]))|(?=[^w%$]([^w]|$))|$)’,
}
- Parameters
label_mapping (dict) – maps labels to their encoded integers
parameters (dict) –
Contains all the appropriate parameters for the model. Possible parameters are:
max_length, max_num_chars, dim_embed
- Returns
None
-
reset_weights
()¶ Reset the weights of the model.
- Returns
None
-
predict
(data, batch_size=None, show_confidences=False, verbose=True)¶ Applies the regex patterns (within regex_model) to the input_string, create predictions for all matching patterns. Each pattern has an associated entity and the predictions of each character within the string are given a True or False identification for each entity. All characters not identified by ANY of the regex patterns in the pattern_dict are considered background characters, and are replaced with the default_label value.
- Parameters
data (iterator) – list of strings to predict upon
batch_size (N/A) – does not impact this model and should be fixed to not be required.
show_confidences – whether user wants prediction confidences
verbose (bool) – Flag to determine whether to print status or not
- Returns
char level predictions and confidences
- Return type
dict
-
classmethod
load_from_disk
(dirpath)¶ Loads whole model from disk with weights
- Parameters
dirpath (str) – directory path where you want to load the model from
- Returns
None
-
save_to_disk
(dirpath)¶ Saves whole model to disk with weights.
- Parameters
dirpath (str) – directory path where you want to save the model to
- Returns
None
-
add_label
(label, same_as=None)¶ Adds a label to the data labeler.
- Parameters
label (str) – new label being added to the data labeler
same_as (str) – label to have the same encoding index as for multi-label to single encoding index.
- Returns
None
-
classmethod
get_class
(class_name)¶
-
get_parameters
(param_list=None)¶ Returns a dict of parameters from the model given a list. :param param_list: list of parameters to retrieve from the model. :type param_list: list :return: dict of parameters
-
classmethod
help
()¶ Help function describing alterable parameters.
- Returns
None
-
property
label_mapping
¶ mapping of labels to their encoded values
- Type
return
-
property
labels
¶ Retrieves the label :return: list of labels
-
property
num_labels
¶
-
requires_zero_mapping
= False¶
-
property
reverse_label_mapping
¶ Reversed order of current labels, useful for when needed to extract Labels via indices
- Type
return
-
set_label_mapping
(label_mapping)¶ Sets the labels for the model
- Parameters
label_mapping (Union[list, dict]) – label mapping of the model or list of labels to be converted into the label mapping
- Returns
None
-
set_params
(**kwargs)¶ Given kwargs, set the parameters if they exist.