Skip to main content

Auto-Caption Images with WD14 Tagger (wd14_captioner)

This plugin uses the WD14 tagger (from the kohya-ss/sd-scripts) to automatically generate Danbooru-style tags for image datasets. It is ideal for preparing high-quality captions for datasets used in fine-tuning Stable Diffusion or similar models.

Step 1: Prepare Your Image Dataset

Upload a dataset containing image files. The dataset must include an image column (default: "image"). You can configure the name of this column via the Image Field parameter.

The model supports .jpg, .jpeg, .png, and .webp formats.

Step 2: Configure Plugin Parameters

Use the parameters panel to control the tag generation behavior:

ParameterDescription
Image FieldDataset column that contains the image files
Tag Confidence ThresholdMinimum confidence score for a tag to be included
General ThresholdOptional threshold specifically for general (non-character) tags
Character ThresholdOptional threshold specifically for character tags
ONNX Model VariantChoose between ConvNeXt or ViT variants of WD14
Batch SizeNumber of images to process at once
Image ResizeResize shorter side of image before inference
Caption SeparatorCharacter(s) used to join multiple tags
Max Dataloader WorkersMax number of workers to load images during tagging

Step 3: Start the Job

Once your dataset is uploaded and parameters are configured, click the Queue button to start captioning. You can monitor job progress in the Executions tab.

When executed, the plugin will:

  • Load your image dataset
  • Run the selected WD14 model on each image
  • Generate tags/captions based on your thresholds
  • Save the results as a new dataset with two columns:
    • image (original file path)
    • caption (generated tags)

Step 4: View the Output

After completion, you can view the new dataset inside the Datasets tab under Generated Datasets. The resulting dataset will contain the original images and a new column with generated captions.

You can also edit the captions and create a new dataset for downstream tasks like training, search, or labeling.

Output Example

imagecaption
pokemon_1.pngsolo, simple_background, white_background, full_body, black_eyes, pokemon_(creature), no_humans, animal_focus
pokemon_2.jpgsolo, smile, open_mouth, simple_background, red_eyes, white_background, standing, full_body, pokemon_(creature), no_humans, fangs, bright_pupils, claws, white_pupils, bulbasaur

Model Variants

  • wd-v1-4-convnext-tagger-v2.onnx: More accurate, but larger
  • wd-v1-4-vit-tagger-v2.onnx: Lightweight alternative

These models will be automatically downloaded and cached if not already present.

WD14 Captioner Plugin in Action