dataprofiler.labelers.labeler_utils module¶
Contains functions for the data labeler.
- dataprofiler.labelers.labeler_utils.f1_report_dict_to_str(f1_report: dict, label_names: list[str]) str ¶
Return the report string from the f1_report dict.
- Example Output:
precision recall f1-score support
class 0 0.00 0.00 0.00 1 class 1 1.00 0.67 0.80 3
micro avg 0.67 0.50 0.57 4 macro avg 0.50 0.33 0.40 4
weighted avg 0.75 0.50 0.60 4
Note: this is generally taken from the classification_report function inside sklearn. :param f1_report: f1 report dictionary from sklearn :type f1_report: dict :param label_names: names of labels included in the report :type label_names: list(str) :return: string representing f1_report printout :rtype: str
- dataprofiler.labelers.labeler_utils.evaluate_accuracy(predicted_entities_in_index: list[list[int]], true_entities_in_index: list[list[int]], num_labels: int, entity_rev_dict: dict[int, str], verbose: bool = True, omitted_labels: tuple[str, ...] = ('PAD', 'UNKNOWN'), confusion_matrix_file: str | None = None) tuple[float, dict] ¶
Evaluate accuracy from comparing predicted labels with true labels.
- Parameters:
predicted_entities_in_index (list(array(int))) – predicted encoded labels for input sentences
true_entities_in_index (list(array(int))) – true encoded labels for input sentences
entity_rev_dict (dict([index, entity])) – dictionary to convert indices to entities
verbose (boolean) – print additional information for debugging
omitted_labels (list() of text labels) – labels to omit from the accuracy evaluation
confusion_matrix_file (str) – File name (and dir) for confusion matrix
:return : f1-score :rtype: float
- dataprofiler.labelers.labeler_utils.get_tf_layer_index_from_name(model: tf.keras.Model, layer_name: str) int | None ¶
Return the index of the layer given the layer name within a tf model.
- Parameters:
model – tf keras model to search
layer_name – name of the layer to find
- Returns:
layer index if it exists or None
- dataprofiler.labelers.labeler_utils.hide_tf_logger_warnings() None ¶
Filter out a set of warnings from the tf logger.
- dataprofiler.labelers.labeler_utils.protected_register_keras_serializable(package: str = 'Custom', name: str | None = None) Callable ¶
Protect against already registered keras serializable layers.
Ensures that if it was already registered, it will not try to register it again.
- class dataprofiler.labelers.labeler_utils.FBetaScore(num_classes: int, average: str | None = None, beta: float = 1.0, threshold: float | None = None, name: str = 'fbeta_score', dtype: str | None = None, **kwargs: Any)¶
Bases:
Metric
Computes F-Beta score.
Adapted and slightly modified from https://github.com/tensorflow/addons/blob/v0.12.0/tensorflow_addons/metrics/f_scores.py#L211-L283
# Copyright 2019 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the “License”); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # https://github.com/tensorflow/addons/blob/v0.12.0/LICENSE # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an “AS IS” BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ==============================================================================
It is the weighted harmonic mean of precision and recall. Output range is [0, 1]. Works for both multi-class and multi-label classification. $$ F_{beta} = (1 + beta^2) * frac{textrm{precision} * textrm{precision}}{(beta^2 cdot textrm{precision}) + textrm{recall}} $$ :param num_classes: Number of unique classes in the dataset. :param average: Type of averaging to be performed on data.
Acceptable values are None, micro, macro and weighted. Default value is None.
- Parameters:
beta – Determines the weight of precision and recall in harmonic mean. Determines the weight given to the precision and recall. Default value is 1.
threshold – Elements of y_pred greater than threshold are converted to be 1, and the rest 0. If threshold is None, the argmax is converted to 1, and the rest 0.
name – (Optional) String name of the metric instance.
dtype – (Optional) Data type of the metric result.
- Returns:
float.
- Return type:
F-Beta Score
Initialize FBetaScore class.
- update_state(y_true: tf.Tensor, y_pred: tf.Tensor, sample_weight: tf.Tensor | None = None) None ¶
Update state.
- result() Tensor ¶
Return f1 score.
- get_config() dict ¶
Return the serializable config of the metric.
- add_variable(shape, initializer, dtype=None, aggregation='sum', name=None)¶
- add_weight(shape=(), initializer=None, dtype=None, name=None)¶
- property dtype¶
- classmethod from_config(config)¶
- reset_state()¶
Reset all of the metric state variables.
This function is called between epochs/steps, when a metric is evaluated during training.
- stateless_reset_state()¶
- stateless_result(metric_variables)¶
- stateless_update_state(metric_variables, *args, **kwargs)¶
- property variables¶
- class dataprofiler.labelers.labeler_utils.F1Score(num_classes: int, average: str | None = None, threshold: float | None = None, name: str = 'f1_score', dtype: str | None = None)¶
Bases:
FBetaScore
Computes F-1 Score.
# Copyright 2019 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the “License”); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # https://github.com/tensorflow/addons/blob/v0.12.0/LICENSE # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an “AS IS” BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ==============================================================================
It is the harmonic mean of precision and recall. Output range is [0, 1]. Works for both multi-class and multi-label classification. $$ F_1 = 2 cdot frac{textrm{precision} cdot textrm{recall}}{textrm{precision} + textrm{recall}} $$ :param num_classes: Number of unique classes in the dataset. :param average: Type of averaging to be performed on data.
Acceptable values are None, micro, macro and weighted. Default value is None.
- Parameters:
threshold – Elements of y_pred above threshold are considered to be 1, and the rest 0. If threshold is None, the argmax is converted to 1, and the rest 0.
name – (Optional) String name of the metric instance.
dtype – (Optional) Data type of the metric result.
- Returns:
float.
- Return type:
F-1 Score
Initialize F1Score object.
- add_variable(shape, initializer, dtype=None, aggregation='sum', name=None)¶
- add_weight(shape=(), initializer=None, dtype=None, name=None)¶
- property dtype¶
- classmethod from_config(config)¶
- reset_state()¶
Reset all of the metric state variables.
This function is called between epochs/steps, when a metric is evaluated during training.
- result() Tensor ¶
Return f1 score.
- stateless_reset_state()¶
- stateless_result(metric_variables)¶
- stateless_update_state(metric_variables, *args, **kwargs)¶
- update_state(y_true: tf.Tensor, y_pred: tf.Tensor, sample_weight: tf.Tensor | None = None) None ¶
Update state.
- property variables¶
- get_config() dict ¶
Get configuration.