Diffusion Trainer

The Diffusion Trainer allows you to create and manage LoRA training jobs for diffusion models using Transformer Lab. This plugin enables training custom adaptors that can be used with Text-to-Image, Image-to-Image, and Inpainting workflows. The trainer supports CUDA environments and offers flexibility in setting up custom diffusion training tasks.

Step 1: Setup

Open the Plugins tab.
Filter by trainer plugins.
Install the Diffusion Trainer plugin.

note

This plugin only works with NVIDIA and AMD GPUs. It requires a CUDA environment to run.

Now download a diffusion model in the Model Zoo. We recommend stabilityai/stable-diffusion-xl-base-1.0 as a good starting point.

Step 2: Create an Image Dataset

Transformer Lab works with Hugging Face Datasets. You can use any image dataset on HF, for example datasets-examples/doc-image-6

But most likely you'd like to train your Diffusion Model on your own images. To create a new dataset:

Go to the Datasets tab.
Click New +.
In the pop-up, select Image as the dataset type, enter a name, and click Next.
Upload your folder of images. The folder should follow the Hugging Face Image Datasets format:
```
folder_uploaded/
├── image1.jpg
├── image2.jpg
├── image3.jpg
└── ...
└── metadata.jsonl
```
- You do not need to include a metadata.jsonl file when uploading images.
- After the images are uploaded, you can add captions manually in the app by editing the dataset.
- Alternatively, you may upload a metadata.jsonl file alongside the images, which is helpful if you already have captions or tags prepared.
- If you decide to include metadata.jsonl, it must have a column named file_name that matches each image file. Other columns (e.g., captions, tags) are allowed and can be named as you like.

You can also organize your dataset into subfolders for splits or labels, for example:

folder_uploaded/
├── train/
│   ├── image1.jpg
│   ├── image2.jpg
│   └── metadata.jsonl
├── valid/
│   ├── image3.jpg
│   ├── image4.jpg
│   └── metadata.jsonl

This structure ensures your dataset is compatible and supports advanced features like captions and data splits.

Step 3: Setup a Train

Navigate to the Train tab.
Click on the New button.
In the pop-up, you can leave most settings to their defaults. Click here to see detailed descriptions of all the advanced settings
- Template/Task Name:
  Set a unique name for your training template/task.
- Dataset Tab:
  Select a Text-Image dataset to use for training. Text-Image datasets contain paired image and caption data required for diffusion model training. Datasets are loaded from the Datasets tab in Transformer Lab.
  
  Example Dataset: For testing and learning, you can use the datasets-examples/doc-image-6 dataset, which contains 4 high-quality images with detailed captions - perfect for experimenting with diffusion training.

Advanced Plugin Configuration

Core Training Parameters

Adaptor Name: Name for the LoRA adaptor that will be created and saved (required)
Trigger Word: Optional trigger word to prepend to all captions during training (e.g., 'sks person' or 'ohwx style')
Number of Training Epochs: Number of training epochs (default: 100)
Train Batch Size: Number of images per batch (default: 1)
Gradient Accumulation Steps: Steps to accumulate gradients before updating weights (default: 1)

Dataset Configuration

Caption Column: Name of the column containing image captions (default: "text")
Image Column: Name of the column containing images (default: "image")
Caption Dropout Rate: Probability of dropping captions during training (default: 0.0)

Image Processing

Image Resolution: Image resolution for training (default: 512)
Center Crop: Use center crop instead of random crop (default: false)
Image Interpolation Mode: Interpolation method for resizing (default: "lanczos")
Random Horizontal Flip: Apply random horizontal flip (default: false)

Data Augmentation

Enable Color Jitter: Enable color jitter augmentation (default: false)
Color Jitter Brightness: Brightness variation amount (default: 0.1)
Color Jitter Contrast: Contrast variation amount (default: 0.1)
Color Jitter Saturation: Saturation variation amount (default: 0.1)
Color Jitter Hue: Hue variation amount (default: 0.05)
Enable Random Rotation: Enable random rotation (default: false)
Random Rotation Degrees: Maximum rotation degrees (default: 5)
Random Rotation Probability: Probability of applying rotation (default: 0.3)

LoRA Configuration

LoRA Rank (r): LoRA rank - higher values = more parameters but better quality (default: 8)
LoRA Alpha: LoRA scaling factor (default: 16)

Optimizer Settings

Learning Rate: Learning rate for optimizer (default: 1e-4)
LR Scheduler: Learning rate schedule type (default: "constant")
LR Warmup Steps: Steps to gradually increase learning rate (default: 50)
Adam Beta 1: Adam optimizer beta1 parameter (default: 0.9)
Adam Beta 2: Adam optimizer beta2 parameter (default: 0.999)
Adam Weight Decay: Weight decay for regularization (default: 0.01)
Adam Epsilon: Adam epsilon for numerical stability (default: 1e-8)
Max Grad Norm: Maximum gradient norm for clipping (default: 1.0)

Advanced Training Options

Loss Type: Loss function type - "l2" or "huber" (default: "l2")
Huber Loss Beta: Beta parameter for Huber loss (default: 0.1)
Prediction Type: Prediction type - "epsilon" or "v_prediction" (default: "epsilon")
SNR Gamma: Signal-to-noise ratio gamma for loss weighting (default: 0)
Min-SNR Gamma: Minimum SNR gamma value (default: 0)
Noise Offset: Offset added to noise for training (default: 0)

Performance Optimization

Mixed Precision: Enable mixed precision training - "no", "fp16", or "bf16" (default: "no")
Enable xFormers Memory Efficient Attention: Use xFormers for memory efficiency (default: false)
Enable Gradient Checkpointing: Trade compute for memory (default: false)
Use EMA (Exponential Moving Average): Use Exponential Moving Average of weights (default: false)
EMA Decay Rate: EMA decay rate (default: 0.9999)

Evaluation

Evaluation Prompt: Text prompt for generating evaluation images (default: "")
Evaluation Steps: Generate evaluation images every N epochs (default: 1)
Evaluation Inference Steps: Denoising steps for evaluation images (default: 50)
Evaluation Guidance Scale: Guidance scale for evaluation generation (default: 7.5)

Logging

Log to Weights and Biases: Log training metrics to Weights & Biases (default: true)

Save the training template by clicking on Save Training Template.

Step 4: Queueing the Training Job

After saving the training template, click on Queue to start the training job.

While the training is running, you can view the output logs and monitor progress and even view the eval images. The trainer will:

Process your Text-Image dataset
Train the LoRA adaptor on the selected diffusion model
Generate evaluation images if configured
Save the trained adaptor for use in diffusion workflows

Step 5: Using the Trained Adaptor

Once the training is finished, the LoRA adaptor becomes available for use with diffusion models. You can:

Load the adaptor in the Foundation tab under your diffusion model
Use in Text-to-Image generation by selecting the adaptor in the Diffusion tab
Apply to Image-to-Image workflows for style transfer using your custom adaptor
Utilize in Inpainting tasks to maintain consistent style across modified regions

The trained adaptor will enhance the base diffusion model with your custom style or subject learned from the training dataset.

Training Tips

Dataset Preparation

Image Quality: Use high-quality images with consistent resolution
Caption Quality: Write detailed, descriptive captions that accurately describe the images
Dataset Size: Start with 10-50 high-quality image-caption pairs for initial experiments
Consistency: Maintain consistent style, lighting, or subject matter for better results

Parameter Tuning

Start Conservative: Use default parameters for your first training run
Batch Size: Increase batch size if you have sufficient GPU memory
Learning Rate: Lower learning rates (1e-5) for fine details, higher (1e-3) for major style changes
LoRA Rank: Higher ranks capture more detail but require more memory and training time

Monitoring Training

Use Evaluation Prompts: Set meaningful evaluation prompts to visually track training progress
Watch Loss Curves: Monitor training loss through Weights and Biases integration
Evaluation Images: Review generated evaluation images to assess training quality

This diffusion trainer enables you to create powerful custom adaptors that can transform the behavior of diffusion models for your specific use cases, whether for artistic style transfer, subject-specific generation, or specialized image domains.

Training Effectiveness Comparison

To demonstrate the power of a Simpsons-style LoRA adaptor trained on the Simpsons BLIP Captions dataset, let's compare two scenarios using the prompt "An astronaut floating in space".

Base Model Only

Using the base diffusion model without any adaptor:

Base model result: Standard realistic style

Result: The base diffusion model generates a realistic depiction of an astronaut floating in space without any stylized characteristics.

With Simpsons Adaptor

Using the trained Simpsons-style LoRA adaptor:

Simpsons style: Vibrant animation aesthetic

Result: Applying the Simpsons adaptor transforms the scene into a bright, animated aesthetic reminiscent of The Simpsons while preserving the astronaut theme.

Key Takeaways

Stylistic Transformation: LoRA adaptors can inject distinct artistic styles into diffusion outputs with minimal overhead.
Resource Efficiency: Fine-tuning with LoRA adapters requires far fewer resources than full model training.
Data Quality Matters: High-quality, targeted datasets like the Simpsons captions dataset ensure coherent and consistent style transfer.

When training your own adaptors, remember to:

Choose meaningful trigger words that don't conflict with common vocabulary
Use the trigger word consistently during training
Always include the trigger word when generating images with your trained adaptor
Test different trigger word strategies to find what works best for your specific use case

Additional Pokemon-style Example

Diffusion Trainer

Step 1: Setup

Step 2: Create an Image Dataset