Regex Model¶
Contains class for regex data labeling model.
- class dataprofiler.labelers.regex_model.RegexModel(label_mapping: Dict[str, int], parameters: Optional[Dict] = None)¶
Bases:
dataprofiler.labelers.base_model.BaseModel
Class for regex data labeling model.
Initialize Regex Model.
- Example regex_patterns:
- regex_patterns = {
- “LABEL_1”: [
“LABEL_1_pattern_1”, “LABEL_1_pattern_2”, …
], “LABEL_2”: [
“LABEL_2_pattern_1”, “LABEL_2_pattern_2”, …
}
- Example encapsulators:
- encapsulators = {
‘start’: r’(?<![w.$%-])’, ‘end’: r’(?:(?=(b|[ ]))|(?=[^w%$]([^w]|$))|$)’,
}
- Parameters
label_mapping (dict) – maps labels to their encoded integers
parameters (dict) –
Contains all the appropriate parameters for the model. Possible parameters are:
max_length, max_num_chars, dim_embed
- Returns
None
- reset_weights() None ¶
Reset weights.
- predict(data: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], batch_size: Optional[int] = None, show_confidences: bool = False, verbose: bool = True) Dict ¶
Apply the regex patterns (within regex_model) to the input_string.
Create predictions for all matching patterns. Each pattern has an associated entity and the predictions of each character within the string are given a True or False identification for each entity. All characters not identified by ANY of the regex patterns in the pattern_dict are considered background characters, and are replaced with the default_label value.
- Parameters
data (iterator) – list of strings to predict upon
batch_size (N/A) – does not impact this model and should be fixed to not be required.
show_confidences – whether user wants prediction confidences
verbose (bool) – Flag to determine whether to print status or not
- Returns
char level predictions and confidences
- Return type
dict
- classmethod load_from_disk(dirpath: str) dataprofiler.labelers.regex_model.RegexModel ¶
Load whole model from disk with weights.
- Parameters
dirpath (str) – directory path where you want to load the model from
- Returns
None
- save_to_disk(dirpath: str) None ¶
Save whole model to disk with weights.
- Parameters
dirpath (str) – directory path where you want to save the model to
- Returns
None
- add_label(label: str, same_as: Optional[str] = None) None ¶
Add a label to the data labeler.
- Parameters
label (str) – new label being added to the data labeler
same_as (str) – label to have the same encoding index as for multi-label to single encoding index.
- Returns
None
- classmethod get_class(class_name: str) Optional[Type[dataprofiler.labelers.base_model.BaseModel]] ¶
Get subclasses.
- get_parameters(param_list: Optional[List[str]] = None) Dict ¶
Return a dict of parameters from the model given a list.
- Parameters
param_list (List[str]) – list of parameters to retrieve from the model.
- Returns
dict of parameters
- classmethod help() None ¶
Help describe alterable parameters.
- Returns
None
- property label_mapping: Dict[str, int]¶
Return mapping of labels to their encoded values.
- property labels: List[str]¶
Retrieve the label.
- Returns
list of labels
- property num_labels: int¶
Return max label mapping.
- requires_zero_mapping: bool = False¶
- property reverse_label_mapping: Dict[int, str]¶
Return reversed order of current labels.
Useful for when needed to extract Labels via indices.
- set_label_mapping(label_mapping: Union[List[str], Dict[str, int]]) None ¶
Set the labels for the model.
- Parameters
label_mapping (Union[list, dict]) – label mapping of the model or list of labels to be converted into the label mapping
- Returns
None
- set_params(**kwargs: Any) None ¶
Set the parameters if they exist given kwargs.