Fine-Tuning Code Completion Models: The Easy Way

March 27, 2024
by Oleg Klimov


How to Start Fine-tune on Server

  1. Follow the Self-hosted or Enterprise guide to start a server. To run the container, you need one of the following options: a computer with an NVidia GPU and nvidia-compatible docker; Runpod Cloud GPU; AWS Account.
  2. Open server UI in a browser. In the “Projects” tab, create a project and add your source code files. You can use a link to a git repo (including a private repo) or upload a .zip file.
  3. Go to the “Finetune” tab, hit “Launch”. A finetune should be ready in 5 hours - the number is for one RTX 3090 and a codebase of 1000 files.

That’s all you really need to know. The rest of this document describes how to make it even better.

What Source Files To Choose?

Train on files that you think are good - this way to model will learn to emulate good style and practices. You can also see this as a way of knowledge transfer, from expert engineers in your company to all the others: the model will write code the way your internal expert would do it.

What is LoRA?

It’s not necessary to finetune all weights of a model. There are Parameter-Efficient Fine-Tuning (PEFT) methods, the method we use all the time is called LoRA. It has advantages: it trains faster, it needs much less memory to train, it retains the speed of the original model, and it’s possible to switch LoRAs very fast during inference.

Compared to unfreezing all the weights of a model it is also less prone to “catastrophic forgetting” - the phenomenon of forgetting previously learned skills when learning something new. It’s easy to see why from the math of how LoRA works: it’s a small addition to the weights, by making it even smaller it’s possible to go back to the original weights and skills of the base model. So nothing is ever lost during LoRA training: the process only needs to balance new information (higher LoRA weights) and retaining the base model capabilities (smaller LoRA weights).

How Much Information is Possible to Inject?

For a 3B model, the largest LoRA that is still possible to run with good hardware acceleration has 150M parameters. That’s a lot, 150M models were popular just a few years ago! But in practice, including feedback from our clients, we still see limitations on how much information a LoRA can retain.

Here are some examples that can help you to draw a line between a fine-tuning that will likely work and one that will not.

From that experience it appears that LoRA likes to concentrate on a single language or topic. Feeding it several dissimilar software projects written in different languages is probably a bad idea.

But fear not, in Refact you can train several finetunes and assign them to different teams!

Changing Hyperparameters and Comparing Runs

If you want to play around with learning rates, LoRA sizes, training steps, weight decay - it’s convenient to do that within Refact UI.

The first thing you need is a test set. You just need 1 to 3 source files that you think are representative of your codebase. They will be automatically subtracted from the train set. You can upload individual files BTW, not necessarily a .zip archive. is the only file used at test set in this setup.

Calculated on the same thing, the test loss will be comparable between runs and models. Strictly speaking, for a test loss to be comparable you also need the same tokenizer, and that is different for different models, but in reality tokenizers are pretty close together for English.

You need to look for the lowest test loss among all runs you try. The test loss is a good measurement of how little the model is surprised by what it sees in your test files.

Make sure training goes through several epochs before the lowest test loss is reached.

Try Fine-Tuning in Enterprise:
3 months free! is a fine-tuned AI coding that boosts developers’ productivity by 80%. It features context-aware AI completions, in-IDE chat, and in-line code commands for faster, high-quality code delivery. As a secure alternative to copilot, can be deployed on-premises, ensuring 100% safety of your data. Maximize your software development efficiency with our AI solution for companies!