Auto-Caption Images with WD14 Tagger (wd14_captioner
)
This plugin uses the WD14 tagger (from the kohya-ss/sd-scripts) to automatically generate Danbooru-style tags for image datasets. It is ideal for preparing high-quality captions for datasets used in fine-tuning Stable Diffusion or similar models.
Step 1: Prepare Your Image Dataset
Upload a dataset containing image files. The dataset must include an image column (default: "image"
). You can configure the name of this column via the Image Field parameter.
The model supports
.jpg
,.jpeg
,.png
, and.webp
formats.
Step 2: Configure Plugin Parameters
Use the parameters panel to control the tag generation behavior:
Parameter | Description |
---|---|
Image Field | Dataset column that contains the image files |
Tag Confidence Threshold | Minimum confidence score for a tag to be included |
General Threshold | Optional threshold specifically for general (non-character) tags |
Character Threshold | Optional threshold specifically for character tags |
ONNX Model Variant | Choose between ConvNeXt or ViT variants of WD14 |
Batch Size | Number of images to process at once |
Image Resize | Resize shorter side of image before inference |
Caption Separator | Character(s) used to join multiple tags |
Max Dataloader Workers | Max number of workers to load images during tagging |
Step 3: Start the Job
Once your dataset is uploaded and parameters are configured, click the Queue
button to start captioning. You can monitor job progress in the Executions
tab.
When executed, the plugin will:
- Load your image dataset
- Run the selected WD14 model on each image
- Generate tags/captions based on your thresholds
- Save the results as a new dataset with two columns:
image
(original file path)caption
(generated tags)
Step 4: View the Output
After completion, you can view the new dataset inside the Datasets
tab under Generated Datasets
. The resulting dataset will contain the original images and a new column with generated captions.
You can also edit the captions and create a new dataset for downstream tasks like training, search, or labeling.
Output Example
image | caption |
---|---|
pokemon_1.png | solo, simple_background, white_background, full_body, black_eyes, pokemon_(creature), no_humans, animal_focus |
pokemon_2.jpg | solo, smile, open_mouth, simple_background, red_eyes, white_background, standing, full_body, pokemon_(creature), no_humans, fangs, bright_pupils, claws, white_pupils, bulbasaur |
Model Variants
wd-v1-4-convnext-tagger-v2.onnx
: More accurate, but largerwd-v1-4-vit-tagger-v2.onnx
: Lightweight alternative
These models will be automatically downloaded and cached if not already present.
