Refact.ai Agent achieved a 93.3% score on Aider's Polyglot Benchmark Read More
Get Started

Refact.ai Agent + Claude 3.7 Sonnet tops Aider's polyglot benchmark with a 76.4% score [Updated: Now 92.9%]

March 17, 2025
by Katia Bystrakova
4 min read

UPD: Refact.ai AI Agent powered by Claude 3.7 Sonnet, has now achieved top performance on the Aider Polyglot Benchmark:


  • 93.3% with Thinking
  • 92.9% without Thinking

Read more about the new score and Refact.ai Agent’s approach to autonomous programming in this blog post. The text below details our original score, benchmarks vision, and the improvements that led to these results.

Refact.ai Agent, powered by Claude 3.7 Sonnet, has achieved an impressive 76.4% score on Aider’s polyglot benchmark — without thinking capabilities enabled.

This puts Refact.ai Agent at the top of the LLM leaderboard, surpassing Aider’s own score of 60.4% with the same model, as well as with DeepSeek Chat V3, GPT-4.5 Preview, ChatGPT-4o, and others.

This high score was made possible by our iterative approach to solving programming tasks. In your IDE, Refact.ai doesn’t just generate code — it ensures it works by iterating until it achieves a successful result. This guarantees highly accurate, production-ready outcomes with minimal human intervention.

Refact.ai Agent + Claude 3.7 Sonnet achieved a 76.4% score on Aider’s polyglot benchmark [UPDATED: Now scoring 92.9%].

About Aider’s polyglot benchmark

Aider polyglot benchmark evaluates how well AI can handle 225 of the hardest coding exercises from Exercism across C++, Go, Java, JavaScript, Python, and Rust.

It focuses exclusively on the most challenging problems and measures:

This makes Polyglot one of the most representative benchmarks for testing autonomous AI in programming — not just for raw code generation but also for reasoning, precision, and execution.

Explore the full test set in the Aider polyglot benchmark repo on GitHub.

Our approach: How Refact.ai achieved highest scores in the polyglot leaderboard

Refact.ai Agent takes a fully autonomous, iterative approach. It plans, executes, tests, self-corrects, and repeats steps as needed to fully complete tasks with high accuracy — without human input.

Other approaches may follow a more manual workflow, relying on pre-defined scripts and requiring ongoing user intervention to provide context, run tests, and guide the AI. The model itself doesn’t form strategies, search files, or decide when to test.

Refact.ai has a different, autonomy-first AI workflow:

So, Refact.ai interacts with the development environment, verifies its own work, and optimizes resources to solve the task end-to-end, delivering efficient and practical programming flow with a full-scope autonomy. This enables true vibe coding — developers can delegate entire tasks while focusing on other work, then simply receive the final result.

Our approach may slightly differ from what Aider used for solving this benchmark, as our AI agent strategy and vision focus on:

____

Update (April 1, 2025): Building on the approach mentioned above, we implemented several improvements to Refact.ai Agent that also increased its score on Aider’s Polyglot Benchmark:

These enhancements made Refact.ai Agent more reliable for all users — solving tasks more effectively while maintaining optimized token usage. As a result, the benchmark score with Claude 3.7 Sonnet (No-Thinking) increased from 76.4% to 92.9%.

Read the detailed reveal of our approach in this blog post: Refact.ai’s AI Agent achieves the highest score on Aider’s polyglot benchmark: 93.3% with Thinking, 92.9% without

Why we chose Polyglot over SWE Bench

TL;DR: We at Refact.ai see Polyglot a far better measure of AI agents’ problem-solving abilities and their usefulness across a diverse pool of tasks than SWE Bench.

SWE Bench is popular and often seen as a key benchmark for AI coding agents. However, it has significant limitations:

In contrast, Polyglot is far more representative and realistic—it measures how well AI can autonomously interact with diverse, multi-language projects, making it much closer to the environments developers work in every day.

So, we’d like to thank Aider for introducing this comprehensive benchmark! It provides great insights into AI coding tools and helps drive better solutions.

Key features of Refact.ai’s autonomous AI Agent

Refact.ai’s advanced AI Agent thinks and acts like a developer, handling software engineering tasks end-to-end.

Why this matters for developers

This isn’t just about ranking highly on a benchmark — it’s about real-world coding impact. Refact.ai’s AI agent helps developers and software companies:


Get Refact.ai for your IDE

Vibe coding is the future of software development: get 10x productivity with Refact.ai Agent by your side in IDE. Work smarter, not harder — whether you’re debugging, testing, or deploying.

Refact.ai autonomously handles your programming tasks end-to-end like a senior dev: makes multi-file edits, writes accurate code that matches your workflow, and integrates seamlessly with your tools.

Our autonomous AI Agent is available to everyone — get started with Refact.ai today: download it for VS Code or JetBrains.

Want to get it for your team? Please fill out the form to book a demo call.