Blog | Transformer Lab

Transformer Lab Can Talk Now: Introducing Text-to-Speech, Training & One-Shot Voice Cloning

September 9, 2025 · 5 min read

Person

🎉 Transformer Lab just got a voice! We’re thrilled to announce audio modality support so you can generate, clone, and train voices directly in Transformer Lab.

What’s included in this release

🎙️ Turn text into speech (TTS) with CUDA, AMD and MLX
🛠️ Train your own TTS models on CUDA and AMD
🧬 Clone a voice in one shot for lightning-fast replication on CUDA and AMD

🚀 Text-to-Speech on MLX

We’ve added TTS support to Transformer Lab’s MLX generation plugin, making it easier than ever to generate natural-sounding audio.

Here’s how you can try it today:

Install the Apple Audio MLX Server plugin
Pick a supported audio model in the Foundation tab
Switch to the Audio tab
Adjust your generation settings and start creating speech instantly!

🎧 Supported Model Families

We currently support several powerful TTS model categories. Here are a few examples you can try right now:

Kokoro → mlx-community/Kokoro-82M-4bit
Dia → mlx-community/Dia-1.6B
Spark → mlx-community/Spark-TTS-0.5B-bf16
Bark → mlx-community/bark-small
CSM → mlx-community/csm-1b

👀 Watch It in Action

Here’s a quick demo showing how simple it is to generate speech in Transformer Lab using Kokoro-82M-4bit:

In just a few clicks, we went from plain text to lifelike audio. For this example, we used the sentence:

“Hello! Welcome to Transformer Lab, where we turn text into natural-sounding speech.”

🎛️ MLX Generation Parameters

When you generate audio with the MLX plugin, you’ll see a set of parameters you can adjust to customize the output. Here’s what each one does:

text → The input string you want to convert to speech.
Sample Rate → Number of audio samples per second; higher rates mean clearer, more detailed audio.
Temperature → Controls randomness in speech; lower = consistent, higher = more expressive and varied.
Speech Speed → Adjusts how quickly the text is spoken: slower for clarity, faster for natural pacing.

⚡ Text-to-Speech & One-Shot Cloning on CUDA and AMD

On CUDA and AMD, you can perform one-shot audio cloning replicating a voice instantly from just one reference sample

Here’s how you can try it today:

Install the Unsloth Text-to-Speech Server plugin
Pick a supported audio model in the Foundation tab
Switch to the Audio tab
Adjust your generation settings and start creating speech instantly!

🎧 Supported Model Families

Orpheus → unsloth/orpheus-3b-0.1-ft
CSM → unsloth/csm-1b

👀 Watch It in Action

Here’s a quick demo showing how simple it is to generate speech in Transformer Lab using unsloth/orpheus-3b-0.1-ft:

First, here’s the model generating speech directly from text:

Next, we provided a single sample of the target voice we wanted to clone:

Finally, here’s the result — the model speaking the same sentence, but now in the cloned voice:

🏗️Training Your Own TTS Model on CUDA and AMD

While one-shot cloning is powerful, you can take it even further by training a model directly on the target voice. This gives the model more examples to learn from, resulting in more consistent and natural-sounding speech.

For this demo, we used the bosonai/EmergentTTS-Eval dataset and trained a custom TTS model inside Transformer Lab.

🎛️ Training Parameters

Here are some of the key parameters you’ll see in the training configuration tab:

Sampling Rate → Audio sampling frequency
Audio Column Name → Dataset column containing audio files
Text Column Name → Dataset column containing transcriptions

For a complete list of training parameters and detailed explanations, see the Text-to-Speech Training Documentation.

👀 Watch It in Action

To compare, here are three samples: Before training — the model’s default voice generating our sentence:

Sample from dataset — a real voice clip the model trained on:

After training — the model reproducing the same sentence in the target voice:

We’re just getting started with audio support in Transformer Lab, and we want to make sure we’re adding the models that matter most to you. 🎙️

👉 Which text-to-speech or voice cloning models would you like to see supported next?

Drop your suggestions in our Discord community — we’re always listening and excited to hear your ideas.

Support for Diffusion Models

June 4, 2025 · 3 min read

Deep Gandhi

Person

Ali Asaria

Person

kk, Transformer Lab now supports Diffusion image generation and training!

Out of the box we support major open weight base models including:

Stable Diffusion (1.5, XL, 3)
Flux
And a lot more! (see the full list here)
(We also support thousands of LoRAs trained on these base models)

What can you do? Well...

Transformer Lab Now Works with AMD GPUs

May 26, 2025 · 17 min read

Deep Gandhi

Person

We're excited to announce that Transformer Lab now supports AMD GPUs! Whether you're on Linux or Windows, you can now harness the power of your AMD hardware to run and train models with Transformer Lab.
👉 Read the full installation guide here

TL;DR

If you have an AMD GPU and want to do ML work, just follow our guide above and skip a lot of stress.

The journey for us to figure out how to build a reliable PyTorch workspace on AMD was... messy. And we've documented everything below.

Expanding Document Support in Transformer Lab with Markitdown

April 16, 2025 · 2 min read

Deep Gandhi

Person

We're excited to announce a significant enhancement to Transformer Lab - integration with the open-source Markitdown library from Microsoft! This update dramatically expands the types of documents you can work with in Transformer Lab, making it more versatile and powerful for your AI projects.

Building and Evaluating a RAG Pipeline in Transformer Lab

March 26, 2025 · 5 min read

Deep Gandhi

Person

Retrieval-Augmented Generation (RAG) combines the power of retrieval systems with generative AI to create more accurate, factual, and contextually relevant responses. In this hands-on tutorial, we'll walk through building and evaluating a complete RAG pipeline in Transformer Lab using documentation files as our knowledge base.

Accelerating Model Training with Multi-GPU Support in Transformer Lab

March 24, 2025 · 3 min read

Deep Gandhi

Person

Transformer Lab is excited to announce robust multi-GPU support for fine-tuning large language models. This update allows users to leverage all available GPUs in their system, dramatically reducing training times and enabling work with larger models and datasets.

Integration with Ollama using the Ollama Python SDK

March 21, 2025 · 7 min read

Tony Salomone

Person

Transformer Lab has recently added an Ollama Server plugin which allows users to run inference through Ollama on their local machine.

How to Plugin? A Step-by-Step Guide to Transformer Lab Plugins

February 19, 2025 · 5 min read

Deep Gandhi

Person

In this guide, we'll walk through creating an evaluator plugin within Transformer Lab named sample-data-print. This plugin will load a dataset and print its contents, along with some sample parameters, using the new tlab_trainer decorator approach.

Generating Datasets and Training Models with Transformer Lab

February 12, 2025 · 3 min read

Deep Gandhi

Person

Introduction

In this tutorial, we'll explore how to bridge a knowledge gap in our model by generating custom dataset content and then fine-tuning the model using a LoRA adapter. The process begins with generating data from raw text using the Generate Data from Raw Text Plugin and concludes with fine-tuning via the MLX LoRA Plugin within Transformer Lab.

Fine Tuning a Python Code Completion Model

February 7, 2025 · 7 min read

Sanjay

Person

This post details our journey to fine-tune smolLM 135M, a compact language model, for Python code completion.

We chose smolLM 135M for its size, which allows for rapid iteration. Instead of full fine-tuning, we employed LoRA (Low-Rank Adaptation), a technique that introduces trainable "adapter" matrices into the transformer layers. This provides a good balance between parameter efficiency and achieving solid results on the downstream task (code completion).

Transformer Lab handled the training, evaluation, and inference, abstracting away much of the underlying complexity. We used the flytech/python codes-25k dataset, consisting of 25,000 Python code snippets, without any specific pre-processing. Our training setup involved a constant learning rate, a batch size of 4, and an NVIDIA RTX 4060 GPU.

The Iterative Fine-tuning Process: Nine Runs to Success

The core of this project was an iterative refinement of LoRA hyperparameters and training duration. We tracked both the training loss and conducted qualitative assessments of the generated code (our "vibe check") to judge its syntactic correctness and logical coherence. This combination of quantitative and qualitative feedback proved crucial in guiding our parameter adjustments.

What’s included in this release​

🚀 Text-to-Speech on MLX​

🎧 Supported Model Families​

👀 Watch It in Action​

🎛️ MLX Generation Parameters​

⚡ Text-to-Speech & One-Shot Cloning on CUDA and AMD​

🎧 Supported Model Families​

👀 Watch It in Action​

🏗️Training Your Own TTS Model on CUDA and AMD​

🎛️ Training Parameters​

👀 Watch It in Action​

TL;DR​

Introduction​

The Iterative Fine-tuning Process: Nine Runs to Success​

What’s included in this release

🚀 Text-to-Speech on MLX

🎧 Supported Model Families

👀 Watch It in Action

🎛️ MLX Generation Parameters

⚡ Text-to-Speech & One-Shot Cloning on CUDA and AMD

🎧 Supported Model Families

👀 Watch It in Action

🏗️Training Your Own TTS Model on CUDA and AMD

🎛️ Training Parameters

👀 Watch It in Action

TL;DR

Introduction

The Iterative Fine-tuning Process: Nine Runs to Success