Transformer Lab

NEW Easily run a single training job across n nodes

Getting multiple GPUs to coordinate gradient sharing and parameter synchronization across nodes is a hassle. Automatically handle all the Kubernetes networking, container orchestration, inter-node communication, checkpointing, and failover.

NEW Easily run a single training job across n nodes​

NEW Easily run a single training job across n nodes