Skip to main content

Red Teaming Evaluations

This plugin helps you evaluate Language Models (LLMs) for vulnerabilities and weaknesses through red teaming techniques. It systematically tests for various security concerns including bias, misinformation, PII leakage, and unauthorized access attempts.

Steps to Use the Plugin

1. Download the Plugin

  • Step: Go to the Plugins tab.
  • Instruction: Download the Red Teaming Evaluations Plugin.
GIF Animation

2. Create a New Evaluation Task

  • Step: Go to the Evaluate tab.
  • Instruction: Click on the Add Task button.
  • A pop-up will appear where you configure your evaluation task.
  • Tabs:
    • Introduction Tab: Provides details about the plugin.
    • Name Tab: Set the name for your task.

3. Configure Plugin Settings

  • Tab: Plugin Config
  • Vulnerability Testing Areas:
    • Bias Detection: Select bias categories to test:
      • Gender
      • Race
      • Religion
      • Political bias
    • Misinformation: Test for:
      • Factual errors
      • Unsupported claims
      • Expertise misrepresentation
    • PII Leakage: Evaluate protection against:
      • Database access
      • Direct leakage
      • Session leaks
      • Social manipulation
    • Personal Safety: Check handling of:
      • Self-harm content
      • Bullying
      • Unsafe practices
      • Dangerous challenges
    • Toxicity: Test responses to:
      • Profanity
      • Insults
      • Threats
      • Mockery
    • Robustness: Assess vulnerability to:
      • Prompt hijacking
      • Input overreliance
    • Unauthorized Access: Test security against:
      • SQL Injection
      • Shell Injection
      • Debug Access
      • SSRF (Server-Side Request Forgery)
      • RBAC bypasses
      • BOLA (Broken Object Level Authorization)
      • BFLA (Broken Function Level Authorization)
    • Illegal Activity: Detection of content related to:
      • Weapons
      • Drugs
      • Cybercrime
    • Graphic Content: Test handling of sensitive material
    • Intellectual Property: Check for protection of:
      • Copyright
      • Trademark
      • Patent information
  • Attack Enhancement Methods:
    • Select multiple enhancement methods from a single dropdown:
      • Encoding Techniques (BASE64, ROT13, LEETSPEAK)
      • Jailbreak Patterns (Crescendo, Linear, and Tree approaches)
      • Advanced Methods (Gray box attacks, Prompt injection, Multilingual attacks)
      • Specialized Probing (Math problems, Prompt probing)
GIF Animation
  • Test Parameters:
    • Judge Model: Select which LLM will evaluate the results
    • Number of Attacks: Define how many attacks per vulnerability to test
    • Target Details: Specify the purpose and system prompt of your target model
GIF Animation

5. Run and View the Evaluation Task

  • Step: Click on the Queue button.
  • Instruction: The evaluation task will run on your dataset.
  • Outcome: You can view a detailed report showing vulnerabilities detected, attack success rates, and recommended mitigations.
GIF Animation