📄️ Inference Engines
You can select different inference engines depending on your available hardware, and the selected model's architecture.
📄️ Ollama Server Plugin
The Ollama Server plugin is our recommended plugin for running GGUF models across all platforms,
📄️ Chat and Completions
After running a model in the Foundation tab, you can interact with it using the Interact tab, which offers both Chat and Completions interfaces.
📄️ RAG (Retrieval-Augmented Generation)
RAG enhances large language models by retrieving relevant information from your documents before generating responses. This allows the model to access external knowledge not included in its training data, providing more accurate and context-aware answers.
📄️ Batched Query
The Batched Query interface allows you to send multiple requests to the model in one go. You can define a batch of chats (multi-turn conversations) or a batch of completion texts.
📄️ Embeddings and Tokenize
This page introduces two powerful features: Embeddings and Tokenize.
📄️ Visualize Logprobs
The Visualize Logprobs interface is an experimental feature under development and is currently available only with the MLX inference engine. This feature provides a visual representation of the log probabilities (logprobs) associated with each token in the generated completion.
📄️ Tool Calling
The Tool Calling interface is a work-in-progress feature designed to integrate formal function calls within your interactions. Currently, it supports basic functions like add, subtract, multiply, divide, and get_weather.