Your choice of agentic IDE or tool matters

As of mid-2025, I mostly trust any coding model and ask them instead of doing web searches for simple programming-related questions. I also like to switch between IDEs/tools, so I can get a feeling of how they all evolve.

Surprisingly, depending on the tool you choose, the same simple question can yield completely different outcomes. These differences are not minor, and might be related with how each tool shapes prompts, manages legal risk, and interacts with the underlying model.

The problem

I recently wanted to make a tiny change in one of my personal projects, but my Django knowledge is rusty after years of not working with the framework. I decided to do it with GitHub Copilot, so I opened VS Code, the folder, told it that something along the lines of "I have a Django 2.x project, and I want to add a link in the footer of the base template if and only if the user is authenticated and an administrator. How can I do that check?", in Ask mode.

To my surprise, after beginning to reply, it cut the content with the dreaded message "Sorry, the response matched public code so it was blocked. Please rephrase your prompt". While I do have the "Suggestions matching public code" setting as Blocked:

  1. I asked a question about my code, so I'd expect the suggestion to show how to do it on my existing logic
  2. the suggestion would be a one-liner, with such a simple and generic code that it feels absurd to even try to match it with anything. Of course there will be public code with .is_authenticated and .is_staff, I'm using a FOSS framework 🤔

This behaviour feels not only restrictive to the point of being almost an error, but also new. I'm a long-time user of GH Copilot, and while I've observed that message before, it was very rarely, and always when I asked it to provide me with relatively big chunks of code. For example, I think that most if not all of my personal projects include some external dependency, package and/or framework.

The only other explanation that I can come up with is that Copilot matched the suggestion with my own public repository 😅. The documentation mentions a surrounding block of about 150 characters (source), but I cannot debug in VS Code the match (as explained here), logs only show a generic message 0 returned. finish reason: [content_filter]. And I cannot disable the public code block, so it'll remain a mystery for the time being.

The alternative

Out of curiosity, I decided to test the same scenario with Cursor. It instantly came back with a response, mostly the same as the final commit and based on my code, and prompted me if I wanted to apply the change in my code.

How different can two tools behave doing the same thing!

Playing the devil’s advocate, one can argue that Microsoft is a big corporation that wants to avoid any issue. Meanwhile, Cursor is a startup at the crest of the AI hype wave, meaning that they could be willing to act more dangerously if that makes their product look more effective. But still, I think that false positives can be as bad as no security measures.

Another alternative

Let's add a third participant to the experiment, Gemini CLI. Google is also a big corporation, and the Gemini models have some fame about being more careful than OpenAI's or Anthropic's.

Same example: Boot up the CLI tool from inside the repository's folder, and ask it to perform the change plus the location of the file.

The results: 40 seconds later, I got exactly the same suggestion, 100% match, to my final commit.

Other criteria

I do not wish to make an exhaustive comparison of tools, nor a list of criteria to evaluate them. "Experience is the best teacher" they say, plus I'm not an expert in the subject. My intention is only to remark that the same LLM can behave very differently depending on the tool it runs on, with a simple example. Go and try by yourself multiple tools, see how well or bad each performs on similar tasks, and extract your own conclusions.

A tiny Speed comparison

As Gemini took a significant amount of time, I decided to check logs or re-run the experiment and naively benchmark them.

  • GitHub Copilot looked very fast: Before masking the response, it took it around 5 seconds to reply with part of the solution, so we can guess it would finish below or around the 10 seconds mark. No custom instructions.
  • Cursor is also fast: It performed two local searches to understand my code structure, and then produced the changeset and "results summary" in ~ 12 seconds. I have some custom user rules telling it to plan and think before executing, so I expect it to be significally faster on a default setup.
  • Gemini felt slower: It is a single data point, but still, took it more than 3 times to perform such a simple change. No custom instructions/context.

Summary

The tool you pick to work matters significantly. They are not mere prompt wrappers with a few tool calling functionalities, they actively tweak your input, gather related information and guide the underlaying model (or models) via custom prompts and instructions.

Tags: Automation Development ML & AI Productivity Tools

Your choice of agentic IDE or tool matters article, written by Kartones. Published on