📄️ Generate Data from Documents
This page explains how to generate data from reference documents using Transformer Lab.
📄️ Generate Fact-based QnA Dataset From Documents
This page explains how to generate a fact-based question and answer dataset from documents using Transformer Lab.
📄️ Huggingface (YourBench) Dataset Generation
This page explains how to generate data from reference documents using Transformer Lab leveraging the YourBench framework by 🤗 Huggingface.
📄️ Generate Data from Raw Text
This page explains how to generate data from raw text using Transformer Lab.
📄️ Generate Data from Scratch
This page explains how to generate data from just concepts of a dataset using Transformer Lab.
📄️ Generate Batched RAG Outputs from Datasets
This page explains how to generate batched RAG (Retrieval-Augmented Generation) outputs from datasets using Transformer Lab.
📄️ Generate QA, CoT, or Summary Dataset from Documents (synthetic-dataset-kit)
The synthetic-dataset-kit plugin creates synthetic datasets from your uploaded documents using powerful local language models. It supports three generation modes: QA (Question Answering), CoT (Chain of Thought), and Summary, allowing you to create a wide range of fine-tuning datasets.
📄️ Generate Image Dataset from Prompts (dataset_imagegen)
This plugin generates an image dataset using the local text-to-image diffusion model such as Stable Diffusion. It takes prompts from a user-provided dataset and outputs generated images along with associated metadata.
📄️ Auto-Caption Images with WD14 Tagger (wd14_captioner)
This plugin uses the WD14 tagger (from the kohya-ss/sd-scripts) to automatically generate Danbooru-style tags for image datasets. It is ideal for preparing high-quality captions for datasets used in fine-tuning Stable Diffusion or similar models.