This post details our journey to fine-tune smolLM 135M, a compact language model, for Python code completion.
We chose smolLM 135M for its size, which allows for rapid iteration. Instead of full fine-tuning, we employed LoRA (Low-Rank Adaptation), a technique that introduces trainable "adapter" matrices into the transformer layers. This provides a good balance between parameter efficiency and achieving solid results on the downstream task (code completion).
Transformer Lab handled the training, evaluation, and inference, abstracting away much of the underlying complexity. We used the flytech/python codes-25k
dataset, consisting of 25,000 Python code snippets, without any specific pre-processing. Our training setup involved a constant learning rate, a batch size of 4, and an NVIDIA RTX 4060 GPU.
The Iterative Fine-tuning Process: Nine Runs to Success
The core of this project was an iterative refinement of LoRA hyperparameters and training duration. We tracked both the training loss and conducted qualitative assessments of the generated code (our "vibe check") to judge its syntactic correctness and logical coherence. This combination of quantitative and qualitative feedback proved crucial in guiding our parameter adjustments.