Base Model¶
Contains abstract classes for labeling data.
- class dataprofiler.labelers.base_model.AutoSubRegistrationMeta(clsname: str, bases: Tuple[type, ...], attrs: Dict[str, object])¶
Bases:
abc.ABCMeta
For registering subclasses.
Create auto registration object and return new class.
- mro()¶
Return a type’s method resolution order.
- register(subclass)¶
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
- class dataprofiler.labelers.base_model.BaseModel(label_mapping: Union[List, Dict], parameters: Dict)¶
Bases:
object
For labeling data.
Initialize Base Model.
Only model and model parameters are stored here. :param label_mapping: label mapping of the model or list of labels to be
converted into the label mapping
- Parameters
parameters (dict) – Contains all the appropriate parameters for the model. Must contain num_labels.
- Returns
None
- requires_zero_mapping: bool = False¶
- property label_mapping: Dict[str, int]¶
Return mapping of labels to their encoded values.
- property reverse_label_mapping: Dict[int, str]¶
Return reversed order of current labels.
Useful for when needed to extract Labels via indices.
- property labels: List[str]¶
Retrieve the label.
- Returns
list of labels
- property num_labels: int¶
Return max label mapping.
- classmethod get_class(class_name: str) Optional[Type[dataprofiler.labelers.base_model.BaseModel]] ¶
Get subclasses.
- get_parameters(param_list: Optional[List[str]] = None) Dict ¶
Return a dict of parameters from the model given a list.
- Parameters
param_list (List[str]) – list of parameters to retrieve from the model.
- Returns
dict of parameters
- set_params(**kwargs: Any) None ¶
Set the parameters if they exist given kwargs.
- add_label(label: str, same_as: Optional[str] = None) None ¶
Add a label to the data labeler.
- Parameters
label (str) – new label being added to the data labeler
same_as (str) – label to have the same encoding index as for multi-label to single encoding index.
- Returns
None
- set_label_mapping(label_mapping: Union[List[str], Dict[str, int]]) None ¶
Set the labels for the model.
- Parameters
label_mapping (Union[list, dict]) – label mapping of the model or list of labels to be converted into the label mapping
- Returns
None
- classmethod help() None ¶
Help describe alterable parameters.
- Returns
None
- abstract reset_weights() None ¶
Reset the weights of the model.
- Returns
None
- abstract predict(data: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], batch_size: int, show_confidences: bool, verbose: bool) Dict ¶
Predict the data with the current model.
- Parameters
data (iterator of data to process) – model input data to predict on
batch_size (int) – number of samples in the batch of data
show_confidences (bool) – whether user wants prediction confidences
verbose (bool) – Flag to determine whether to print status or not
- Returns
char level predictions and confidences
- Return type
dict
- abstract classmethod load_from_disk(dirpath: str) dataprofiler.labelers.base_model.BaseModel ¶
Load whole model from disk with weights.
- Parameters
dirpath (str) – directory path where you want to load the model from
- Returns
loaded model
- Return type
- abstract save_to_disk(dirpath: str) None ¶
Save whole model to disk with weights.
- Parameters
dirpath (str) – directory path where you want to save the model to
- Returns
None
- class dataprofiler.labelers.base_model.BaseTrainableModel(label_mapping: Union[List, Dict], parameters: Dict)¶
Bases:
dataprofiler.labelers.base_model.BaseModel
Contains abstract method for training models.
Initialize Base Model.
Only model and model parameters are stored here. :param label_mapping: label mapping of the model or list of labels to be
converted into the label mapping
- Parameters
parameters (dict) – Contains all the appropriate parameters for the model. Must contain num_labels.
- Returns
None
- abstract fit(train_data: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], val_data: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], batch_size: Optional[int] = None, epochs: Optional[int] = None, label_mapping: Optional[Dict[str, int]] = None, reset_weights: bool = False, verbose: bool = True) Tuple[Dict, Optional[float], Dict] ¶
Train the current model with the training data and validation data.
- Parameters
train_data (Union[pd.DataFrame, pd.Series, np.ndarray]) – Training data used to train model
val_data (Union[pd.DataFrame, pd.Series, np.ndarray]) – Validation data used to validate the training
batch_size (int) – Used to determine number of samples in each batch
epochs (int) – Used to determine how many epochs to run
label_mapping (dict) – Mapping of the labels
reset_weights (bool) – Flag to determine whether or not to reset the model’s weights
- Returns
history, f1, f1_report
- Return type
Tuple[dict, float, dict]
- add_label(label: str, same_as: Optional[str] = None) None ¶
Add a label to the data labeler.
- Parameters
label (str) – new label being added to the data labeler
same_as (str) – label to have the same encoding index as for multi-label to single encoding index.
- Returns
None
- classmethod get_class(class_name: str) Optional[Type[dataprofiler.labelers.base_model.BaseModel]] ¶
Get subclasses.
- get_parameters(param_list: Optional[List[str]] = None) Dict ¶
Return a dict of parameters from the model given a list.
- Parameters
param_list (List[str]) – list of parameters to retrieve from the model.
- Returns
dict of parameters
- classmethod help() None ¶
Help describe alterable parameters.
- Returns
None
- property label_mapping: Dict[str, int]¶
Return mapping of labels to their encoded values.
- property labels: List[str]¶
Retrieve the label.
- Returns
list of labels
- abstract classmethod load_from_disk(dirpath: str) dataprofiler.labelers.base_model.BaseModel ¶
Load whole model from disk with weights.
- Parameters
dirpath (str) – directory path where you want to load the model from
- Returns
loaded model
- Return type
- property num_labels: int¶
Return max label mapping.
- abstract predict(data: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], batch_size: int, show_confidences: bool, verbose: bool) Dict ¶
Predict the data with the current model.
- Parameters
data (iterator of data to process) – model input data to predict on
batch_size (int) – number of samples in the batch of data
show_confidences (bool) – whether user wants prediction confidences
verbose (bool) – Flag to determine whether to print status or not
- Returns
char level predictions and confidences
- Return type
dict
- requires_zero_mapping: bool = False¶
- abstract reset_weights() None ¶
Reset the weights of the model.
- Returns
None
- property reverse_label_mapping: Dict[int, str]¶
Return reversed order of current labels.
Useful for when needed to extract Labels via indices.
- abstract save_to_disk(dirpath: str) None ¶
Save whole model to disk with weights.
- Parameters
dirpath (str) – directory path where you want to save the model to
- Returns
None
- set_label_mapping(label_mapping: Union[List[str], Dict[str, int]]) None ¶
Set the labels for the model.
- Parameters
label_mapping (Union[list, dict]) – label mapping of the model or list of labels to be converted into the label mapping
- Returns
None
- set_params(**kwargs: Any) None ¶
Set the parameters if they exist given kwargs.