evalmate.evaluator¶
This module implements the top-level functionality for performing the evaluation for the different tasks.
For every task there is an Evaluator (extends Evaluator
) and an Evaluation (extends Evaluation
.
The Evaluator is the is class responsible to perform the evaluation and the Evaluation is the output,
which contains the aligned labels/segments and depending on the task further data like word confusions.
Base¶
-
class
evalmate.evaluator.
Evaluation
(ref_outcome, hyp_outcome)[source]¶ Base class for evaluation results.
Variables: -
get_report
(template=None)[source]¶ Generate and return a report.
Parameters: template (str) – Name of the Jinja2 template to use. If None, the default_template()
is used. All available templates are in thereport_templates
folder.Returns: The rendered report. Return type: str
-
template_data
¶ Return a dictionary that contains objects/values to use in the rendering template.
-
-
class
evalmate.evaluator.
Evaluator
[source]¶ Base class for a evaluator.
Provides methods for reading outcomes in different ways. The evaluator for a specific class then has to implement
do_evaluate
, which performs the evaluation on ref and hyp outcome.-
classmethod
default_label_list_idx
()[source]¶ Define the default label-lists which is used when reading a corpus.
-
do_evaluate
(ref, hyp)[source]¶ Create the evaluation result of the given hypothesis compared to the given reference (ground truth).
Parameters: Returns: The evaluation results.
Return type:
-
evaluate
(ref, hyp, label_list_idx=None)[source]¶ Create the evaluation result of the given hypothesis compared to the given reference (ground truth). There are different possibilities of input:
- ref = Outcome / hyp = Outcome: Both ref and hyp are Outcome instances.
See
do_evaluate
- ref = Corpus / hyp = dict: The dict contains label-lists which are compared against the corpus.
See
evaluate_label_lists_against_corpus
- ref = LabelList / hyp = LabelList: Ref label-list is compared against the other.
See
evaluate_label_lists
Parameters: - ref (LabelList, Corpus) – A label-list, a corpus.
- hyp (LabelList, dict) – A label-list, a dict.
- label_list_idx (str) – The label-list to use when reading from a corpus.
Returns: The evaluation results.
Return type: - ref = Outcome / hyp = Outcome: Both ref and hyp are Outcome instances.
See
-
evaluate_label_lists
(ll_ref, ll_hyp, duration=None)[source]¶ Create Evaluation for ref and hyp label-list. If the duration is not provided some metrics cannot be used.
Parameters: - ref (LabelList) – A label-list.
- hyp (LabelList) – A label-list.
- duration (float) – The duration of the utterance, that belongs to the label-lists.
Returns: The evaluation results.
Return type:
-
evaluate_label_lists_against_corpus
(corpus, label_lists, label_list_idx=None)[source]¶ Create Evaluation for the given corpus.
Parameters: - corpus (Corpus) – A corpus containing the reference label-lists.
- label_lists (Dict) – A dictionary containing label-lists with the utterance-idx as key. The utterance-idx is used to find the corresponding reference label-list in the corpus.
- label_list_idx (str) – The idx of the label-lists to use as reference from the corpus. If None, cls.default_label_list_idx is used.
Returns: The evaluation results.
Return type:
-
classmethod
Outcome¶
-
class
evalmate.evaluator.
Outcome
(label_lists=None, utterance_durations=None)[source]¶ An outcome represents the annotation/labels/transcriptions of a dataset/corpus for a given task. This can be either the ground truth/reference or the system output/hypothesis.
If no durations are provided or duration for some utterances are missing, some methods may not work or throw exceptions.
Variables: - label_lists (dict) – Dictionary containing all label-lists with the utterance-idx/sample-idx as key.
- utterance_durations (dict) – Dictionary (utterance-idx/duration) containing the durations of all utterances.
-
all_values
¶ Return a set of all values, occurring in the outcome.
-
label_set_for_value
(value)[source]¶ Return a label-set containing all labels, where the value is value.
Parameters: value (str) – The value to filter. Returns: Label-set containing all labels with the given value. Return type: LabelSet
-
total_duration
¶ Return the duration of all utterances together.
Notes
Only works if for all utterances, the durations are provided.
-
class
evalmate.evaluator.
LabelSet
(labels=None)[source]¶ Class to collect a bunch of labels. This is used to compute statistics over a defined set of labels.
For example we want to compute the average length of all labels with the value ‘music’. We can then collect all these in a label-set and perform the computation.
-
count
¶ Return the number of labels.
-
label_lengths
¶ Return a list containing all label lengths.
-
length_max
¶ Return the length of the longest label.
-
length_mean
¶ Return the mean length of all labels.
-
length_median
¶ Return the median of all label lengths.
-
length_min
¶ Return the length of the shortest label.
-
length_variance
¶ Return the variance of all label lengths.
-
Segment¶
-
class
evalmate.evaluator.
SegmentEvaluation
(ref_outcome, hyp_outcome, utt_to_segments)[source]¶ Result of an evaluation of a segment-based alignment.
Parameters: utt_to_segments (dict) – Dict of lists with
evalmate.alignment.Segment
. Key is the utterance-idx.Variables: - ref_outcome (Outcome) – The outcome of the ground-truth/reference.
- hyp_outcome (Outcome) – The outcome of the system-output/hypothesis.
- confusion (AggregatedConfusion) – Confusion result
-
segments
¶ Return a list of all segment (from all utterances together).
-
template_data
¶ Return a dictionary that contains objects/values to use in the rendering template.
-
class
evalmate.evaluator.
SegmentEvaluator
(aligner=None)[source]¶ Evaluation of an alignment based on segments.
Parameters: aligner (SegmentAligner) – An instance of an event-aligner to use. If not given, the alignment.InvariantSegmentAligner
is used.-
classmethod
default_label_list_idx
()[source]¶ Define the default label-lists which is used when reading a corpus.
-
do_evaluate
(ref, hyp)[source]¶ Create the evaluation result of the given hypothesis compared to the given reference (ground truth).
Parameters: Returns: The evaluation results.
Return type:
-
static
flatten_overlapping_labels
(aligned_segments)[source]¶ Check all segments for overlapping labels. Overlapping means there are multiple reference or multiple hypothesis labels in a segment.
Parameters: aligned_segments (List) – List of segments. Returns: List of segments where ref and hyp is a single label. Return type: list Raises: ValueError
– A segment contains overlapping labels.
-
classmethod
Event¶
-
class
evalmate.evaluator.
EventEvaluation
(ref_outcome, hyp_outcome, utt_to_label_pairs)[source]¶ Result of an evaluation of any event-based alignment.
Parameters: utt_to_label_pairs (dict) – Key is the utterance-id, value is a list of
evalmate.alignment.LabelPair
.Variables: - ref_outcome (Outcome) – The outcome of the ground-truth/reference.
- hyp_outcome (Outcome) – The outcome of the system-output/hypothesis.
- confusion (AggregatedConfusion) – Confusion statistics
-
label_pairs
¶ Return a list of all label-pairs (from all utterances together).
-
template_data
¶ Return a dictionary that contains objects/values to use in the rendering template.
-
class
evalmate.evaluator.
EventEvaluator
(aligner)[source]¶ Class to compute evaluation results for any event-based alignment.
Parameters: aligner (EventAligner) – An instance of an event-aligner to use. -
classmethod
default_label_list_idx
()[source]¶ Define the default label-lists which is used when reading a corpus.
-
classmethod
KWS¶
-
class
evalmate.evaluator.
KWSEvaluation
(ref_outcome, hyp_outcome, utt_to_label_pairs)[source]¶ Result of an evaluation of a keyword spotting task.
Parameters: utt_to_label_pairs (dict) – Key is the utterance-id, value is a list of
evalmate.alignment.LabelPair
.Variables: - ref_outcome (Outcome) – The outcome of the ground-truth/reference.
- hyp_outcome (Outcome) – The outcome of the system-output/hypothesis.
- confusion (AggregatedConfusion) – Confusion statistics
-
false_alarm_rate
(keyword=None)[source]¶ The False Alarm Rate (FAR) is the percentage of detections, where no keyword is according to the ground truth. If no keyword is given the mean FAR is calculated over all keywords. This rate is relative to the duration of all utterances.
To calculate this, we need to know the number of times a keyword could be wrongly inserted. We assume that every keyword takes one second to approximate this value.
Parameters: keyword (str) – If not None, only the FFR for this keyword is returned. Returns: A rate between 0 and 1 Return type: float
-
false_rejection_rate
(keyword=None)[source]¶ The False Rejection Rate (FRR) is the percentage of misses of all occurrences in the ground truth. If no keyword is given the mean FRR is calculated over all keywords.
Parameters: keyword (str) – If not None, only the FFR for this keyword is returned. Returns: A rate between 0 and 1 Return type: float
-
term_weighted_value
(keyword=None)[source]¶ Computes the Term-Weighted Value (TWV).
Note
The TWV is implemented according to OpenKWS 2016 Evaluation Plan
Parameters: keyword (str) – If None, computes the TWV over all keywords, otherwise only for the given keyword. Returns: The TWV in the range 1 to -inf Return type: float
-
class
evalmate.evaluator.
KWSEvaluator
(aligner=None)[source]¶ Class to retrieve evaluation results for a keyword spotting task.
Parameters: aligner (EventAligner) – An instance of an event-aligner to use. If not given the evalmate.alignment.BipartiteMatchingAligner
is user.-
classmethod
default_label_list_idx
()[source]¶ Define the default label-lists which is used when reading a corpus.
-
classmethod
ASR¶
-
class
evalmate.evaluator.
ASREvaluation
(ref_outcome, hyp_outcome, utt_to_label_pairs)[source]¶ Result of an evaluation of a automatic speech recognition task.
Parameters: utt_to_label_pairs (dict) – Key is the utterance-id, value is a list of
evalmate.alignment.LabelPair
.Variables: - ref_outcome (Outcome) – The outcome of the ground-truth/reference.
- hyp_outcome (Outcome) – The outcome of the system-output/hypothesis.
- confusion (AggregatedConfusion) – Confusion statistics
-
class
evalmate.evaluator.
ASREvaluator
(aligner=None)[source]¶ Class to retrieve evaluation results for a automatic speech recognition task.
Parameters: aligner (EventAligner) – An instance of an event-aligner to use. If not given, the alignment.LevenshteinAligner
is used.-
classmethod
default_label_list_idx
()[source]¶ Define the default label-lists which is used when reading a corpus.
-
classmethod