Cursor IDE: Part of the hype is real

I've been interested in LLMs since OpenAI's GPT 3 came out. Regarding IDEs, I've mostly focused on VS Code, both using GitHub Copilot and the Continue extension for other models (you can read about it in detail here).

A few weeks ago, everything changed once I decided to go "all-in" into using Cursor. While Microsoft very recently added agentic capabilities to the VS Code + GH Copilot combo, I've yet to try it, being honest because the more I use and learn Cursor, the more I'm amazed at the possibilities already at our grasp (to mere mortals, not only to big techs).

There is hype (a lot), and there are smoke and mirrors, mistakes, hallucinations, losses of context, and a general feeling of everything being a big "beta version". But there is also magic... Things that two years ago were almost science fiction, or required very curated and guided steps, are already working, and they can only improve. This imperfect experience feels like such an explosion of previously complex, boring or really difficult to automate tasks, that I feel it's already worth dealing with the imperfections.

While it is not the right moment yet to showcase something more specific, I can mention that the three pillars that have been hugely impactful for me to get better results with the IDE and the agentic behaviour have been:

  • @Docs: You can teach Cursor new languages, APIs, frameworks, libraries. You only need an unauthenticated-accessible URL.
  • Project and user rules: Rules for me are the IDE's superpower. Kind of mini-prompts to zero-shot "teach", add guidelines, correct mistakes... As a simple example, if you make a rule about writing tests before adding or modifying any source code, the agent will begin doing TDD (Test-Driven Development)!
  • MCPs (Model Context Protocols): "APIs for LLMs". Configure an endpoint and maybe some authentication, and suddenly, your IDE can interact by itself with many external or local services 🤯

I don't doubt that GitHub Copilot will catch up, and other coding assistants will also improve and implement these patterns, so in a way, most or all of what I am learning will be transferrable in some form.

Now, a small word of warning. We must not forget that we are not (yet?) dealing with an intelligent being. I see current LLMs mostly as incredibly complex autocomplete systems, but coding assistants introduce a significant subtlety: As they don't really know what is right or wrong, and tests passing is probably a strong signal when they were trained, they can (and will) cheat their way out, by taking "ugly" shortcuts in order for tests to pass.

Multiple times, when stuck trying to make tests pass, I've seen agents cheating by doing things like:

  • change the behaviour of the code only in tests
  • hardcode stuff and/or modify logic to pass tests (even if it breaks design guidelines!)
  • switch to another similar library
  • forget to run all tests, running instead only a subset of them, but report "The code is now ready for use and all test cases pass with the new xxxx logic."

And worst of all, it will tell you "All is fine, tests are passing and I don't see anything wrong with the code."

This is why we need not only good monitoring, grounding, evaluation and numerous guardrails, but in general to push hard for good alignment and observability (the former still on track, the later still an unmet goal).

Tags: Development ML & AI Productivity Resources Tools

Cursor IDE: Part of the hype is real article, written by Kartones. Published on