libai.evaluation¶
-
class
libai.evaluation.
BLEUEvaluator
[source]¶ Evaluate BLEU(Bilingual Evaluation Understudy) score.
BLEU is a score for comparing a candidate translation of text to one or more reference translations.
-
evaluate
()[source]¶ Evaluate/summarize the performance after processing all input/output pairs.
- Returns
A new evaluator class can return a dict of arbitrary format as long as the user can process the results. In our train_net.py, we expect the following format:
key: the name of the task (e.g., Classification)
value: a dict of {metric name: score}, e.g.: {“Acc@1”: 75.0}
- Return type
dict
-
process
(inputs, outputs)[source]¶ Process the pair of inputs and outputs.
pred_logits = outputs["prediction_scores"] labels = inputs["labels"] # do evaluation on pred_logits/labels pair ...
- Parameters
inputs (dict) – the inputs that’s used to call the model.
outputs (dict) – the return dict of model(**inputs)
-
-
class
libai.evaluation.
ClsEvaluator
(topk=(1, 5))[source]¶ Evaluate accuracy for classification. The metrics range from 0 to 100 (instead of 0 to 1). We support evaluate different topk accuracy. You can reset cfg.train.topk=(1, 5, N) according to your needs.
-
evaluate
()[source]¶ Evaluate/summarize the performance after processing all input/output pairs.
- Returns
A new evaluator class can return a dict of arbitrary format as long as the user can process the results. In our train_net.py, we expect the following format:
key: the name of the task (e.g., Classification)
value: a dict of {metric name: score}, e.g.: {“Acc@1”: 75.0}
- Return type
dict
-
process
(inputs, outputs)[source]¶ Process the pair of inputs and outputs.
pred_logits = outputs["prediction_scores"] labels = inputs["labels"] # do evaluation on pred_logits/labels pair ...
- Parameters
inputs (dict) – the inputs that’s used to call the model.
outputs (dict) – the return dict of model(**inputs)
-
-
class
libai.evaluation.
DatasetEvaluator
[source]¶ Base class for a dataset evaluator. The function
inference_on_dataset()
runs the model over all samples in the dataset, and have a DatasetEvaluator to process the inputs/outputs. This class will accumulate information of the inputs/outputs (byprocess()
), and produce evaluation results in the end (byevaluate()
).-
evaluate
()[source]¶ Evaluate/summarize the performance after processing all input/output pairs.
- Returns
A new evaluator class can return a dict of arbitrary format as long as the user can process the results. In our train_net.py, we expect the following format:
key: the name of the task (e.g., Classification)
value: a dict of {metric name: score}, e.g.: {“Acc@1”: 75.0}
- Return type
dict
-
process
(inputs, outputs)[source]¶ Process the pair of inputs and outputs.
pred_logits = outputs["prediction_scores"] labels = inputs["labels"] # do evaluation on pred_logits/labels pair ...
- Parameters
inputs (dict) – the inputs that’s used to call the model.
outputs (dict) – the return dict of model(**inputs)
-
-
class
libai.evaluation.
PPLEvaluator
[source]¶ Evaluate perplexity for Language Model.
Perplexity is a measurement of how well a probability distribution or probability model predicts a sample.
-
evaluate
()[source]¶ Evaluate/summarize the performance after processing all input/output pairs.
- Returns
A new evaluator class can return a dict of arbitrary format as long as the user can process the results. In our train_net.py, we expect the following format:
key: the name of the task (e.g., Classification)
value: a dict of {metric name: score}, e.g.: {“Acc@1”: 75.0}
- Return type
dict
-
process
(inputs, outputs)[source]¶ Process the pair of inputs and outputs.
pred_logits = outputs["prediction_scores"] labels = inputs["labels"] # do evaluation on pred_logits/labels pair ...
- Parameters
inputs (dict) – the inputs that’s used to call the model.
outputs (dict) – the return dict of model(**inputs)
-
-
class
libai.evaluation.
RegEvaluator
[source]¶ -
evaluate
()[source]¶ Evaluate/summarize the performance after processing all input/output pairs.
- Returns
A new evaluator class can return a dict of arbitrary format as long as the user can process the results. In our train_net.py, we expect the following format:
key: the name of the task (e.g., Classification)
value: a dict of {metric name: score}, e.g.: {“Acc@1”: 75.0}
- Return type
dict
-
process
(inputs, outputs)[source]¶ Process the pair of inputs and outputs.
pred_logits = outputs["prediction_scores"] labels = inputs["labels"] # do evaluation on pred_logits/labels pair ...
- Parameters
inputs (dict) – the inputs that’s used to call the model.
outputs (dict) – the return dict of model(**inputs)
-
-
libai.evaluation.
flatten_results_dict
(results)[source]¶ Expand a hierarchical dict of scalars into a flat dict of scalars. If results[k1][k2][k3] = v, the returned dict will have the entry {“k1/k2/k3”: v}.
- Parameters
results (dict) –
-
libai.evaluation.
inference_on_dataset
(model, data_loader, batch_size, eval_iter, get_batch: Callable, input_placement_device: str, evaluator: Optional[Union[libai.evaluation.evaluator.DatasetEvaluator, List[libai.evaluation.evaluator.DatasetEvaluator]]])[source]¶ Run model on the data_loader and evaluate the metrics with evaluator. Also benchmark the inference speed of model.__call__ accurately. The model will be used in eval mode.
- Parameters
model (callable) – a callable which takes an object from data_loader and returns some outputs. If it’s an nn.Module, it will be temporarily set to eval mode. If you wish to evaluate a model in training mode instead, you can wrap the given model and override its behavior of .eval() and .train().
batch_size – batch size for inference
data_loader – an iterable object with a length. The elements it generates will be the inputs to the model.
eval_iter – running steps for evaluation
get_batch – a Callable function for getting data from dataloader
input_placement_device – used in get_batch, set it to cuda or cpu. see input_placement_device in libai.configs.common.train.py for more details.
evaluator – the evaluator(s) to run. Use None if you only want to benchmark, but don’t want to do any evaluation.
- Returns
The return value of evaluator.evaluate()