Introducing Refact Code LLM: 1.6B State-of-the-Art LLM for Code that Reaches 32% HumanEval

September 4, 2023

by Sergey Vakhreev, Dimitry Ageev, Oleg Klimov, Kirill Starkov

2 min read

Today we’re introducing Refact LLM: 1.6B code model with infill real-time code completion (including fill-in-the-middle(FIM) capability) and chat. Refact LLM achieves the state-of-the-art performance among the code LLMs, coming closer to HumanEval as Starcoder, being 10x smaller in size, and it beats other code models such as StableCode, CodeGen and ReplitCode on HumanEval metric.

Summary:

1.6b parameters
20 programming languages
4096 tokens context
code completion and chat capabilities
SoTA on HumanEval benchmark among similar code models
pre-trained on permissive licensed code and available for commercial use

Model	Model Size	HumanEval pass@1
DeciCoder-1b	1b	19.1%
Refact-1.6-fim	1.6b	32.0%
StableCode	3b	20.2%
ReplitCode v1	3b	21.9%
CodeGen2.5-multi	7b	28.4%
CodeLlama	7b	33.5%
StarCoder	15b	33.6%

The base model was trained on our own set of code with permissive licenses only and open text datasets (the text to code ratio was 50:50). In total, we trained our base model on 1.2T tokens of code on our cluster.

The model was then fine-tuned with open code instruction-following datasets filtered for quality and a synthetic dataset based on The Stack dedup v1.1 to improve FIM and boosting the base model performance.

You can read more about the architecture decisions that we made in the blog post.

We aim for the model to be accessible to everyone, we’re releasing the model for commercial use under BigScience OpenRAIL-M license and making the weight available on HuggingFace.

While the trend recently was for the model sizes to get bigger, we wanted to lower barriers to entry and make it a versatile tool for developers with varying hardware setups. With the smaller size, running the model is much faster and affordable than ever: the model can be served on most of all modern GPUs requiring just 3Gb RAM and works great for real-time code completion tasks.

Refact LLM can be easily integrated into existing developers workflows with an open-source docker container and VS Code and JetBrains plugins. With Refact’s intuitive user interface, developers can utilize the model easily for a variety of coding tasks. Finetune is available in the self-hosting (docker) and Enterprise versions, making suggestions more relevant for your private codebase.

Refact 1.6B LLM is the third model in the family of our code models, with CodeContrast 3b and CodeContrast 0.3b released previously. We aim to continue with our research and future updates to improve the LLM’s performance and capabilities. We would love to get community contributions and feedback to enhance the model further. For any questions and ideas, please visit our Discord.