Skip to main content

EleutherAI Harness Evaluation

EleutherAI Harness is a powerful evaluation framework that lets you measure how well a model performs across a range of standardized benchmarks. Follow the steps below for a guided walkthrough of the evaluation process.

1. Selecting a Model from the Foundation Tab

Start by navigating to the Foundation tab in Transformer Lab. Choose the model you want to evaluate from the list provided.

GIF Animation

2. Downloading the Appropriate Plugin

In order to use the evaluation functionalities, you need to download the correct plugin:

  • For Mac Systems: Download the Eleuther AI LM Evaluation Harness MLX plugin. This version is optimized for Mac systems and provides better support.
  • For Other Systems: Download the Eleuther AI LM Evaluation Harness plugin.
GIF Animation

3. Configuring the Evaluation Task

Configure your evaluation task by following these steps:

  • Name Your Evaluation Task: Enter a descriptive name for easy identification.
  • Select Evaluation Tasks: Choose the suite of tasks within the Harness that you wish to evaluate.
  • Define the Evaluation Scope: Select the fraction of samples to evaluate. The recommended fraction is 1 (using the full benchmark) for a thorough assessment. For testing or debugging, you can choose a lower fraction.
GIF Animation

4. Running the Evaluation

Once you have set up the task, click on the Queue button to start the evaluation process.

GIF Animation

5. Viewing the Results

After the evaluation completes, you can review the results:

  • Job Output: Check the output logs for immediate results and logs of the job execution.
  • Detailed Report: Access the detailed report generated by Harness for an in-depth analysis of the evaluation outcomes.
GIF Animation