Interact | Transformer Lab

📄️ Inference Engines

You can select different inference engines depending on your available hardware, and the selected model's architecture.

📄️ Ollama Server Plugin

The Ollama Server plugin is our recommended plugin for running GGUF models across all platforms,

📄️ Chat and Completions

After running a model in the Foundation tab, you can interact with it using the Interact tab, which offers both Chat and Completions interfaces.

📄️ RAG (Retrieval-Augmented Generation)

RAG enhances large language models by retrieving relevant information from your documents before generating responses. This allows the model to access external knowledge not included in its training data, providing more accurate and context-aware answers.

📄️ Batched Query

The Batched Query interface allows you to send multiple requests to the model in one go. You can define a batch of chats (multi-turn conversations) or a batch of completion texts.

📄️ Embeddings and Tokenize

This page introduces two powerful features: Embeddings and Tokenize.

📄️ Visualize Logprobs

The Visualize Logprobs interface is an experimental feature under development and is currently available only with the MLX inference engine. This feature provides a visual representation of the log probabilities (logprobs) associated with each token in the generated completion.

📄️ Tool Calling

The Tool Calling interface is a work-in-progress feature designed to integrate formal function calls within your interactions. Currently, it supports basic functions like add, subtract, multiply, divide, and get_weather.