ποΈ EleutherAI Harness Evaluation
EleutherAI Harness is a powerful evaluation framework that lets you measure how well a model performs across a range of standardized benchmarks. Follow the steps below for a guided walkthrough of the evaluation process.
ποΈ LLM-as-Judge Evaluations
Transformer Lab provides an easier way to intergrate the LLM-as-judge suite of metrics by DeepEval to evaluate model outputs across multiple dimensions. Here's an overview of the available metrics:
ποΈ Evaluating with Objective Metrics
Transformer Lab provides a suite of industry-standard objective metrics supported by DeepEval to evaluate model outputs. Here's an overview of the available metrics:
ποΈ Basic Evaluation Metrics
This plugin allows you to perform common evaluation checks, such as string matching, string containment, or even custom checks using regular expressions on any output. Itβs designed to help you quickly validate and analyze outputs through predefined as well as custom evaluation metrics.
ποΈ Red Teaming Evaluations
This plugin helps you evaluate Language Models (LLMs) for vulnerabilities and weaknesses through red teaming techniques. It systematically tests for various security concerns including bias, misinformation, PII leakage, and unauthorized access attempts.