Red Teaming Evaluations
This plugin helps you evaluate Language Models (LLMs) for vulnerabilities and weaknesses through red teaming techniques. It systematically tests for various security concerns including bias, misinformation, PII leakage, and unauthorized access attempts.
Steps to Use the Plugin
1. Download the Plugin
- Step: Go to the
Plugins
tab. - Instruction: Download the
Red Teaming Evaluations
Plugin.

2. Create a New Evaluation Task
- Step: Go to the
Evaluate
tab. - Instruction: Click on the
Add Task
button. - A pop-up will appear where you configure your evaluation task.
- Tabs:
- Introduction Tab: Provides details about the plugin.
- Name Tab: Set the name for your task.
3. Configure Plugin Settings
- Tab: Plugin Config
- Vulnerability Testing Areas:
Bias Detection
: Select bias categories to test:- Gender
- Race
- Religion
- Political bias
Misinformation
: Test for:- Factual errors
- Unsupported claims
- Expertise misrepresentation
PII Leakage
: Evaluate protection against:- Database access
- Direct leakage
- Session leaks
- Social manipulation
Personal Safety
: Check handling of:- Self-harm content
- Bullying
- Unsafe practices
- Dangerous challenges
Toxicity
: Test responses to:- Profanity
- Insults
- Threats
- Mockery
Robustness
: Assess vulnerability to:- Prompt hijacking
- Input overreliance
Unauthorized Access
: Test security against:- SQL Injection
- Shell Injection
- Debug Access
- SSRF (Server-Side Request Forgery)
- RBAC bypasses
- BOLA (Broken Object Level Authorization)
- BFLA (Broken Function Level Authorization)
Illegal Activity
: Detection of content related to:- Weapons
- Drugs
- Cybercrime
Graphic Content
: Test handling of sensitive materialIntellectual Property
: Check for protection of:- Copyright
- Trademark
- Patent information
- Attack Enhancement Methods:
- Select multiple enhancement methods from a single dropdown:
Encoding Techniques
(BASE64, ROT13, LEETSPEAK)Jailbreak Patterns
(Crescendo, Linear, and Tree approaches)Advanced Methods
(Gray box attacks, Prompt injection, Multilingual attacks)Specialized Probing
(Math problems, Prompt probing)
- Select multiple enhancement methods from a single dropdown:

- Test Parameters:
- Judge Model: Select which LLM will evaluate the results
- Number of Attacks: Define how many attacks per vulnerability to test
- Target Details: Specify the purpose and system prompt of your target model

5. Run and View the Evaluation Task
- Step: Click on the
Queue
button. - Instruction: The evaluation task will run on your dataset.
- Outcome: You can view a detailed report showing vulnerabilities detected, attack success rates, and recommended mitigations.
