Tutorials

Devstral: An Open-Source Agentic LLM for Software Engineering

Well, it looks like 2025 may just really be the year of the agents. While there has been a lot of speculation about agent development and definitions and implementations vary considerably, we’re seeing meaningful progress in how these agents are materializing – particularly software engineering agents. In this article, we’re going to be taking a look at Devstral, an open-source agentic LLM developed through a collaboration between Mistral AI and All Hands AI.

Devstral is designed for agentic coding – that is, for solving multi-step tasks within large codebases. It’s capable of being run on a single Nvidia RTX 4090 GPU due to its lightweight design, with just 24 billion parameters, facilitating local deployment, on-device use, and privacy-sensitive applications. As an open-source model released under the Apache 2.0 license, Devstral is freely available for commercial use, modification, and integration into proprietary products. Furthermore, the model has a 128k context window, enabling it to process substantial amounts of code and instructions at a time, which is particularly beneficial for large codebases and complex problems. Finally, Devstral utilizes an advanced Tekken tokenizer with a 131k vocabulary size, enhancing its precision and efficiency in handling code and text inputs for accurate, context-aware responses tailored to software engineering.

Before we describe Devstral’s performance, let’s lay the groundwork by discussing SWE-Bench, the current standard for evaluating LLMs on practical coding challenges.

SWE-Bench is an evaluation framework designed to assess the abilities of LLMs to perform software engineering tasks; the benchmark consists of 2,294 software engineering problems from real GitHub issues and correspond to pull requests from 12 popular Python repositories.
To enhance the reliability of evaluations, OpenAI introduced SWE-Bench Verified, a curated subset of 500 tasks from the original benchmark. These tasks were reviewed by professional software developers and further categorized by difficulty, with 196 tasks deemed “easy” (requiring less than 15 minutes to fix) and 45 labeled “hard” (taking over an hour). A task is considered successfully resolved when the model’s code modifications pass the associated unit tests and performance is quantified by the percentage of tasks a model successfully resolves. For example, Claude 3.5 Sonnet achieved a 49% success rate on SWE-Bench Verified.

While there’s been some chatter that SWE-Bench’s efficacy on evaluating LLMs, it’s currently the standard for evaluating the software engineering capabilities of LLMs.

Below we can see that Devstral (at the time of writing) is the top-performing open-source model on SWE-Bench Verified.

The diagram from the release post indicates that Devstral has both better agentic/swe-bench verified performance and lower parameter count than other open-source models like Gemma-3 27B, Qwen 23B-A22B, DeepSeek-V3-0324, DeepSeek-R1, and DeepSeek-V3. Devstral’s small size makes it favourable for inference-intensive agentic use cases.

While there are a multitude of ways (HuggingFace, Ollama, Kaggle, Unsloth, LM Studio) to run Devstral, we will be implementing it with the OpenHands scaffold (a pre-built project template that helps developers get started quickly). This can be done with vLLM.

Step 1 : Set up a GPU Droplet

Begin by setting up a DigitalOcean GPU Droplet, select AI/ML and choose the NVIDIA H100 option.

Step 2: Web Console

Once your GPU Droplet finishes loading, you’ll be able to open up the Web Console.web console

pip3 install vllm –upgrade

Step 3: Install Dependencies

In the web console, copy and paste the following code snippet:

apt install python3-pip python3.10

Step 4: Spin up a vLLM Server

vllm serve mistralai/Devstral-Small-2505 –tokenizer_mode mistral –config_format mistral –load_format mistral –tool-call-parser mistral –enable-auto-tool-choice –tensor-parallel-size 2 docker pull docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik docker run -it –rm –pull=always -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik -e LOG_ALL_EVENTS=true -v /var/run/docker.sock:/var/run/docker.sock -v ~/.openhands-state:/.openhands-state -p 3000:3000 –add-host host.docker.internal:host-gateway –name openhands-app docker.all-hands.dev/all-hands-ai/openhands:0.38

You will see a link in the web console, this will be copied for future use.

Step 6: Open VS Code

In VS Code, click on “Connect to…” in the Start menu.

Choose “Connect to Host…”.

Step 7: Connect to your GPU Droplet

Click “Add New SSH Host…” and enter the SSH command to connect to your droplet. This command is usually in the format ssh root@[your_droplet_ip_address]. Press Enter to confirm, and a new VSCode window will open, connected to your droplet. You can find your droplet’s IP address on the GPU droplet page.

Step 8: Access OpenHands

In the new VSCode window connected to your droplet, type >sim and select “Simple Browser: Show”.

Paste the OpenHands url from the Web Console.

Once OpenHands is launched, there will be a multitude of models to select from, we chose devstral-small-2505. Note that you’ll need an API key.

From here, you’ll be able to connect to a repository or launch from scratch.

Conclusion

While LLM reasoning excels at code completion and isolated functions, real-world software engineering demands an understanding of code within broader systems, the ability to discern relationships between components, and the precision to identify subtle errors within intricate functions – capabilities that Devstral is designed to address. We hope you get the chance to try Devstral for youself.

How does Devstral perform for your software engineering needs? Comment below!

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button