Cursor vs Windsurf vs Claude Code: Agentic IDEs Compared (2026)
If you are doing an agentic IDE comparison 2026, three tools dominate the shortlist: Cursor, Windsurf, and Claude Code. By mid-2026 the market has settled into a clean split. Cursor is the IDE-first agent — a VS Code fork where the Composer panel plans multi-file changes and Background Agents run sandboxed tasks. Windsurf is the agentic IDE built around Cascade, an orchestrator that turns one prompt into a stream of reviewed edits, with periodic Wave updates pushing new agent capabilities. Claude Code is the terminal-native CLI from Anthropic, designed to be scripted, hooked, and embedded in CI — model selection between Sonnet and Opus, MCP tool servers, sub-agents, and headless mode are first-class. None of them is universally “best”. Cursor wins on editor ergonomics, Windsurf wins on guided multi-file flows, Claude Code wins on automation and governance. This piece is a grounded engineering comparison: architecture, named features, honest trade-offs, and a decision matrix you can actually defend in a tooling review.
Architecture at a glance





How the three tools think about agentic coding
Agentic coding is the shift from autocomplete to autonomy. Instead of suggesting the next token, the tool plans, calls tools, edits multiple files, runs commands, observes results, and iterates. Each of the three products picks a different centre of gravity for that loop, and the choice ripples through every other decision they make.
Cursor places the agent inside a familiar editor surface. The mental model is “your IDE, but with a planner sitting next to you.” The Composer pane proposes a diff plan; the Agent mode lets that plan execute across files; Background Agents push longer tasks into a remote sandbox while you keep coding. The optimisation goal is flow inside the editor: the developer is always in the loop, always one keystroke from rejecting a change, and the agent’s authority is bounded by what the editor surfaces. This makes Cursor very approachable for teams migrating from Copilot-style completion to something that can actually refactor a feature.
Windsurf treats the IDE itself as the agent. Cascade — Windsurf’s flagship orchestrator — is not a side panel; it is the primary interaction. You describe an outcome, Cascade plans, Cascade edits, and the editor shows you a Flow timeline of what is happening. The Wave update cadence (Windsurf’s term for periodic capability releases) means Cascade picks up new behaviours frequently. The mental model is closer to “pair programming with an aggressive partner who already has its hands on the keyboard.” Reviewing diffs in the preview pane becomes the developer’s main job. Windsurf optimises for guided multi-file change velocity, especially on greenfield work and feature spikes.
Claude Code abandons the IDE GUI entirely. It is a CLI: claude in your terminal. Context lives in CLAUDE.md, slash commands and hooks live in .claude/, MCP servers expose tools, and sub-agents spawn for scoped work. Headless mode (claude -p) makes it scriptable in CI, pre-commit hooks, and cron. You pick the model — Sonnet for speed and high-frequency tasks, Opus for long-horizon planning. The mental model is “a coding agent you can pipe into anything.” It optimises for automation, governance, and reproducibility — which is why it ends up running large parts of release pipelines, not just authoring code.
The three tools have begun to converge on a shared vocabulary — plans, tool calls, hooks, memory, sub-agents — but their defaults are radically different. Defaults dictate behaviour at scale far more than features do. Picking between them is mostly a question of which defaults match your team.

Architecture comparison
Before diving into each tool individually, it helps to lay them out side-by-side. The table below is the version most engineering managers actually want — short rows, decision-relevant columns, no marketing. Versions are as of mid-2026; treat capability claims as vendor-described unless your team has independently validated them.
| Dimension | Cursor | Windsurf | Claude Code |
|---|---|---|---|
| Agent loop | Composer plan → tool call → editor diff → user review → iterate | Cascade orchestrator → multi-file batch → preview pane → user approval → Wave update | Planner → MCP/tool call → hook checks → optional sub-agent → output |
| Context system | Repo index + open buffers + /rules + Cursor memory |
Workspace index + Cascade memory + Flow timeline | CLAUDE.md + project files + slash commands + sub-agent scoping |
| Tool use | Built-in editor tools, shell, web search, custom commands | Cascade-managed tool calls inside Flows | MCP servers (fs, git, http, db, custom) + bash + edit/read |
| Sandbox / exec | Local exec + Background Agents in remote sandbox | Local exec with preview + approval | Local exec under hook policy; CI runs in CI sandbox |
| Pricing posture | Per-seat plus usage on heavier models; Background Agent minutes metered | Per-seat with Cascade credits; Wave tier for power users | Per-token via Anthropic billing; flat-rate Max plans on consumer tier |
| Auth model | Cursor account + optional SSO; team workspaces | Windsurf account + SSO on enterprise tier | Anthropic API key or Claude account; SSO via gateway |
| On-prem / local LLM | Limited; mostly hosted models, some BYO-key support | Hosted; enterprise tier has private deployments | Hosted Anthropic models; can be fronted by a gateway for on-prem audit |
| Editor surface | VS Code fork, full GUI | VS Code-derived IDE, full GUI with Flow timeline | None — pure CLI; pairs with any editor |
| Best for | Individual developers and small teams wanting AI-first editing | Teams doing heavy multi-file feature work | Platform teams, CI/CD, governed environments, polyglot orgs |
Two things stand out. First, the editor surfaces are converging: both Cursor and Windsurf are VS Code lineage, both have agent panes, both have multi-file diff previews. Second, the execution and governance posture diverges sharply. Claude Code’s settings.json + hook system is the closest thing in the market to a configurable policy engine for what an agent is allowed to do. Cursor and Windsurf push more of that policy into the product UI, which is friendlier for individuals but harder to audit for regulated teams.
The other lens that matters is the relationship between the agent and your version control. Cursor’s Background Agents are explicitly designed around PRs — BugBot reviews them, the agent can open them. Windsurf treats the local working tree as the source of truth and expects you to commit. Claude Code does whatever you tell it to via hooks; teams routinely wire it to open PRs through the GitHub CLI without any in-product Git surface.
Cursor (2026): IDE-first agent, Composer, and Background Agents
Cursor entered 2026 with the largest installed base of the three, and it shows in the product. The editor is a VS Code fork, so your extensions mostly work, your settings mostly transfer, and the muscle memory is unchanged. What you get on top is a stack of agentic capabilities layered around the editor.
Composer is the headline feature for multi-file work. Open it with the standard chord, describe the change, and Cursor plans a sequence of edits across the codebase. The output is a diff bundle: file-by-file changes you accept, reject, or refine. Composer is the “first-class multi-file refactor surface” — when you want to rename a concept across thirty files, split a service, or update a schema, this is the right entry point. It indexes the repo on first open and keeps that index warm; on large monorepos the first-index cost is real, but subsequent invocations are fast.
Agent mode is Composer with permission to act. Where Composer proposes a diff, Agent mode can run shell commands, write files, and iterate against test runs in your local sandbox. This is where the agentic loop becomes visible: it edits, it runs your tests, it reads the failure, it edits again. Teams find this is the right mode for “write a feature end-to-end with passing tests” tasks where they trust the bounded local context.
Background Agents push the loop into a remote sandbox. You hand off a task — “investigate why the nightly build flakes” or “draft a migration from REST to gRPC for the orders service” — and the Background Agent runs detached. It cannot interfere with your local editor, it has its own working copy, and when it is done you can review the result, push it to a branch, or discard it. Pricing for Background Agent minutes is metered separately from the editor subscription, which is a meaningful operational detail.
BugBot is the review-side counterpart. When a PR opens, BugBot can run review against the diff and post automated comments — flagging null-deref risks, subtle concurrency bugs, missing test coverage, and style violations. Treat it as a reviewer that never gets tired and never reviews the design — it is excellent at line-level issues and indifferent to architecture. Many teams keep BugBot as a non-blocking advisor rather than a merge gate.
/rules is the durable memory layer. Project-level rules (committed to the repo) and personal rules (local) tell the agent what conventions to follow, which directories to ignore, which patterns to prefer. The community pattern is to ship a /rules set as part of repo bootstrap, alongside a README — this is the closest equivalent to Claude Code’s CLAUDE.md.
Cursor’s weak spots in 2026 are honest to call out. Model choice is broad but quietly steered by the product — heavier prompts often route to premium models with usage cost. Data residency and on-prem options are improving on the enterprise tier but lag Claude Code’s flexibility for air-gapped environments. And because the agent lives so completely inside the editor, headless and CI use cases are awkward — Cursor is not what you embed in a release pipeline.

Windsurf (2026): Cascade, Wave updates, and the IDE-as-agent pattern
Windsurf made the opposite bet from Cursor. Where Cursor wraps an agent around an editor, Windsurf rebuilt the editor around an agent. Cascade — Windsurf’s orchestrator — is not a panel you open, it is the way you use the product.
A typical Cascade session starts with a goal stated in plain English in the Cascade pane. Cascade gathers context from your workspace (open buffers, repo index, recent edits), produces a plan, and starts executing. Each step might be an edit, a shell command, a search, or a question back to you. The Flow timeline shows the sequence as it happens. The preview pane shows pending file changes before they are applied. The result feels less like prompting and more like delegating.
The interaction style has implications. Cascade is at its best when the goal is reasonably contained — implement this feature, refactor this layer, add this integration — and the codebase has enough structure that the orchestrator can navigate it. On unfamiliar large codebases it can spend significant time exploring. The Cascade memory (workspace-scoped notes about the project) helps with this, and seasoned Windsurf users invest early in priming Cascade’s memory with project conventions, architectural notes, and a list of files the agent should treat as canonical.
Wave updates are Windsurf’s named release cadence — periodic drops where new Cascade capabilities ship together. Examples over 2025–2026 included improvements to multi-file planning, longer-context Cascade sessions, better preview pane diff visualisation, and tighter integration with the Flow timeline. Wave gives the product a steady rhythm of capability uplift that users can plan around. The trade-off is that Cascade’s behaviour shifts between Waves, and teams that lock workflows tightly to current Cascade behaviour can find themselves recalibrating after a release.
Flow is the unit of work. A Flow is a Cascade-initiated task with a timeline, a set of file edits, a set of tool calls, and an outcome. Flows are revisitable — you can scroll back through a Flow’s history, see what Cascade did, and learn from it. This is genuinely useful for retrospectives, post-incident reviews of agent behaviour, and onboarding new engineers who want to see “how the team uses the AI.” It is also a meaningful security artefact: when something goes wrong, Flow history is your audit log.
Windsurf in 2026 has matured into a credible enterprise option. SSO, role-based access, and the ability to deploy in customer-controlled environments are now table stakes on the enterprise tier. The pricing is per-seat with Cascade credits — heavier multi-file Flows consume more credits. As with Cursor, the per-seat plus metered usage model is now standard across the category.
The honest trade-offs for Windsurf: because the editor is built around Cascade, opting out is awkward — you cannot really use Windsurf as “just an editor with autocomplete.” Model selection is more constrained than Cursor’s. And the dependence on Cascade memory means that when memory is mis-primed the agent gets confidently wrong in ways that take real effort to debug. None of these are deal-breakers; they are the cost of a more aggressive integration.

Claude Code (2026): terminal-native CLI, hooks, MCP, sub-agents
Claude Code is the outlier in the comparison because it does not ship an editor. It is a CLI tool — claude — that you run in a terminal next to whatever editor you already use. That framing matters: every design choice in Claude Code follows from “this should be scriptable, governable, and embeddable” rather than “this should feel like a great editor.”
The interactive loop is straightforward. You launch claude in a project directory, it reads CLAUDE.md (your project’s living instructions to the agent), and it accepts natural-language requests, slash commands, or tool calls. The agent plans, calls tools, edits files, runs commands, and reports back. Model selection is explicit — flags or settings choose Sonnet (the default, fast and cost-effective) or Opus (deeper, longer-horizon). Teams routinely mix: Opus for the planning step, Sonnet for the execution loop.
Slash commands are reusable workflows. /init bootstraps a project’s CLAUDE.md from the codebase. /review does a structured PR review. /run invokes a saved playbook. Custom slash commands are shell-callable scripts in .claude/commands/ that the agent can invoke, which means any workflow your team already has as a Makefile target or shell script can become a first-class agent capability.
Hooks are the governance layer. Pre-tool and post-tool hooks let you intercept any action the agent wants to take — block dangerous commands, require confirmation for destructive operations, log every tool call to a SIEM, transform inputs and outputs, enforce policy. This is the feature that makes Claude Code viable in regulated environments. Teams have built hook libraries that prevent the agent from touching production/, force every PR to be opened through a specific workflow, and rewrite outbound HTTP calls to go through an audit proxy. The hook system is documented and stable; engineers who care about deterministic agent behaviour spend a lot of time in it, and our deep dive on LLM tool-calling determinism patterns for 2026 goes much further on why this matters at scale.
MCP (Model Context Protocol) is how Claude Code exposes tools. An MCP server is a process — local or remote — that advertises tools the agent can call: filesystem, git, HTTP, databases, internal APIs, observability platforms. The MCP ecosystem in 2026 includes servers for most major SaaS products, and writing a custom MCP server is straightforward. The practical consequence is that Claude Code becomes the AI front-end to whatever your stack already exposes. For broader context on how this tool-use pattern is evolving, see our overview of Claude 4.6 agent tool-use patterns for 2026.
Sub-agents are scoped Claude instances spawned by the main agent. They get a narrower context and a defined job — “summarise these five files,” “run the test suite and report failures,” “research how this library handles X.” Sub-agents return their result to the parent and exit, which keeps the main agent’s context window clean. Teams use sub-agents to do expensive context-gathering work without blowing up the parent’s context budget.
Headless mode (claude -p "prompt") is what makes Claude Code a CI citizen. It runs non-interactively, takes a prompt, executes, and exits — which is exactly what you want in a build step, a pre-commit hook, a Slack bot, or a scheduled cron. Headless mode honours hooks and CLAUDE.md the same as interactive mode, which means you can ship the same policy posture into automated runs.
Trade-offs are equally honest. There is no GUI; if your team’s workflow lives in a graphical IDE and they want inline ghost text, Claude Code is the wrong primary tool (though many teams pair Claude Code in the terminal with Cursor or Windsurf in the editor — this works). Model choice is locked to Anthropic. The CLI surface, while powerful, has a learning curve that is steeper than clicking into a Composer pane.

Decision matrix: when each one wins
The right way to pick between an AI coding agent IDE is not “which is best” but “which fits the problem and team in front of you.” The five questions below are the ones we ask in real tool selection conversations.
1. Where does the work happen — editor or terminal?
If your developers live in an IDE and the value proposition is “make my editor more agentic,” Cursor or Windsurf are the candidates. If significant work happens at the terminal, in CI, in headless scripts, or in non-editor contexts (chat, on-call, code review bots), Claude Code becomes very compelling.
2. What is the team’s stance on model vendors?
Cursor is the most agnostic — multiple model providers, broad choice. Windsurf is moderately flexible. Claude Code is Anthropic-only. For teams with a strict “no single-vendor lock-in” rule, Claude Code requires an explicit exception. For teams that have already standardised on Anthropic, Claude Code becomes the natural extension of an existing relationship.
3. How important is multi-file change velocity vs editor ergonomics?
Windsurf optimises hardest for multi-file Cascade-driven work; you describe an outcome and the IDE drives. Cursor balances editor feel with multi-file Composer; you remain more involved. Claude Code does multi-file work too, but the experience is “review a diff in the terminal,” which some engineers love and others find taxing.
4. Do you need sandboxed long-running tasks?
Cursor Background Agents are the most polished version of “kick off a long task and come back later.” Claude Code in headless mode against a CI runner gives you the same capability with more setup. Windsurf’s story here is improving but trails the other two.
5. How strict are your data, audit, and policy requirements?
Claude Code’s hook system + headless mode + CLAUDE.md policy is the most auditable posture of the three. Windsurf on its enterprise tier covers SSO, RBAC, and private deployment. Cursor’s enterprise tier is competitive but its policy surface lives mostly in product UI rather than configurable hooks.

A few common patterns we see in practice. Startups and small product teams typically converge on Cursor — fastest editor experience, lowest setup cost, good enough for most tasks. Feature-shop teams doing heavy frontend or product engineering find Windsurf’s Cascade-driven flow accelerates spikes meaningfully. Platform, SRE, security, and ML infrastructure teams lean toward Claude Code because the work is inherently scripted, governed, and automation-heavy. Polyglot organisations end up adopting more than one: Cursor or Windsurf in the editor for individual contributors, Claude Code in CI and in shared automation. This is the honest answer for most large companies.
Trade-offs, gotchas, and security
Picking an autonomous coding assistant 2026 is also picking a set of risks. The category has matured, but real gotchas remain, and a good selection process surfaces them up-front rather than after the contract is signed.
Data residency and source code egress. Every one of these tools sends source code to a model. The vendor’s data policy, regional hosting, and retention defaults matter. Cursor and Windsurf both offer enterprise tiers that promise no training on customer code and configurable data handling; verify exactly what “configurable” means in writing for your contract. Claude Code through Anthropic’s API supports zero-data-retention configurations and various enterprise routing options; if you front it with a gateway, you can enforce policy at the gateway level. For air-gapped environments, all three are difficult — Claude Code is most amenable because the CLI can be pointed at a private endpoint, but the model itself must come from somewhere.
Model lock-in. Cursor lets you bring keys for multiple providers. Windsurf is more constrained but supports several backends. Claude Code is Anthropic-only by design. Lock-in is not always bad — single-vendor lets you optimise for that vendor’s strengths — but it is a strategic decision, not a tactical one. Have it consciously.
Pricing surprises. Per-seat is predictable. Metered usage on heavier models, Background Agent minutes, Cascade credits, and per-token billing are not. Teams that switch from completion to agentic workflows often see usage spike 5–10x because the agent is doing more work per session. Set budgets, monitor, and revisit pricing after the first month of real use. Pay particular attention to model-routing defaults — heavy prompts that silently escalate to premium models can quintuple cost without warning.
Sandbox and command-execution risks. Agents that run shell commands can do real damage. Every tool has guards: Cursor requires approval for risky commands by default; Windsurf surfaces commands in the Flow timeline; Claude Code can be locked down with pre-tool hooks. None of these are perfect. The published failure mode across the category is the agent constructing a destructive command that does not look destructive at a glance — rm invocations with computed paths, git push --force to the wrong remote, database migrations against the wrong environment. Treat sandbox isolation as a primary control, not a nice-to-have. For production-adjacent work, route the agent through a dedicated sandbox or VM, not your laptop. Use hooks (in Claude Code) or workspace policy (in Cursor and Windsurf) to deny direct production access entirely.
Prompt injection through code and content. A README, a comment, a string in a file, a returned web page — any text the agent reads can attempt to instruct it. Practical mitigations: never let the agent execute arbitrary commands sourced from untrusted content without confirmation; treat tool outputs as untrusted; constrain MCP servers to read-only or scoped tokens by default; and review hook policy with the same seriousness as you would review IAM. The literature on this is evolving fast and the defaults are not always safe.
Audit trails and incident response. If an agent does something you wish it had not, can you reconstruct what happened? Windsurf’s Flow timeline, Claude Code’s hook logs, and Cursor’s session history each give you something. None of them give you what an enterprise security team really wants by default — centralised, structured, immutable logs of every tool call across every developer. If that matters, you will be building it on top of the product. Claude Code is the easiest to instrument because hooks can ship every call to your SIEM with a few lines of shell.
Behavioural drift between releases. Cascade after a Wave update, Cursor after a model update, Claude Code after a model bump — agent behaviour shifts. Workflows that depended on the previous behaviour break in subtle ways. The fix is the same in all three: pin what you can, write tests against agent behaviour for critical workflows, and budget time after vendor releases to recalibrate. If your team has built golden-path agent workflows you depend on, treat them like any other production code and write integration tests.
Skill atrophy and over-trust. Less a security gotcha than a team-health one, but worth naming. Engineers who delegate everything stop noticing when the agent is wrong. The teams that get the most lasting value treat the agent as a first draft, not a finished product, and rotate code review responsibilities so a human eye still understands the codebase end-to-end. This is especially true on architectural decisions, which all three agents handle worse than they handle line-level code.
Practical recommendations
Concrete patterns we recommend based on observed adoptions across the category.
For an individual engineer or small startup: start with Cursor on the standard tier. The editor experience is closest to what you already know, the multi-file Composer is excellent for early-stage codebases where you are constantly refactoring, and the cost is bounded by per-seat plus a small usage envelope. Add Claude Code in headless mode for the one or two CI tasks where it earns its keep (release notes, changelog generation, on-call summaries).
For a 20–100 person engineering team without strict governance: Cursor or Windsurf in the editor, picked by which model gives your developers more lift on actual tickets. Run a two-week bake-off on real work, not demos. Add Claude Code centrally for shared automation — release engineering, scheduled reports, the PR-review bot you keep saying you will build. Pay for the enterprise tier of whichever editor you pick, even if you do not need every feature, because SSO and data controls compound in value once you grow.
For a regulated, governed, or large enterprise: Claude Code is the foundation, with Cursor or Windsurf as opt-in editor surfaces. Build a hook library that encodes your policy — deny production paths, route through your audit proxy, enforce branch protections, log to your SIEM. Ship the hook library and a curated CLAUDE.md template as part of project bootstrap. Standardise on a small set of MCP servers your team has reviewed and approved. Treat the agent as another deploy target — versioned, tested, and rolled back if behaviour drifts.
For platform and infra teams: Claude Code, with extensive custom MCP servers wrapping your internal tooling. The investment in writing MCP servers for your service catalogue, deployment system, and observability stack pays back across every script, runbook, and on-call response the team writes. This is also where the sub-agent pattern earns the most — you can spawn agents that go investigate, summarise, and return findings without polluting the main agent’s context window.
For ML and data teams: a mixed posture. Claude Code is excellent for orchestrating notebooks, generating evaluation harnesses, and running scheduled analysis tasks. Cursor or Windsurf is better for the active editing loop on training code where you want fast multi-file refactors. Many teams adopt both and feel no friction.
A few smaller pieces of advice that apply universally. Invest in CLAUDE.md or /rules or Cascade memory — whichever your tool calls it. Project-level agent context is the single highest-leverage thing you can do to make any of these tools dramatically better. Spend an hour writing it; revisit it monthly. Treat agent output as a draft. Read every diff before you accept it, even when you are tired. Budget for a behavioural-drift review every quarter or after every major release of your chosen tool. And resist the urge to pick one tool for all use cases — the right answer for most organisations of meaningful size is two, possibly three, in different roles.
FAQ
Is Cursor still better than Windsurf in 2026?
Neither is universally better. Cursor leads on broad model choice, editor ergonomics, and Background Agents. Windsurf leads on Cascade-driven multi-file flow and the Wave cadence of capability releases. The honest answer is that they are competitive on most tasks and the right choice depends on whether your developers prefer staying close to inline edits (Cursor) or delegating bigger chunks to the orchestrator (Windsurf). Run a bake-off on real tickets.
Can Claude Code replace my IDE?
No, and it is not trying to. Claude Code is a CLI tool that pairs with whatever editor you already use. Most engineers run it in a terminal pane next to VS Code, Neovim, or even Cursor or Windsurf themselves. Where Claude Code shines is automation, governance, and any context where a CLI is the right primitive (CI, scripts, on-call, bots).
How do hooks in Claude Code differ from rules in Cursor?
Cursor’s /rules are instructions the agent reads at context time — guidance about conventions, preferences, and constraints. Claude Code’s hooks are interception points in the tool-call pipeline — code that runs before and after every tool invocation, with the power to block, modify, or log. Rules influence behaviour; hooks enforce it. Many teams use both styles: rules to shape the plan, hooks to police the execution.
What about Copilot, Cody, and Aider — why are they not in this comparison?
GitHub Copilot is in a different category in 2026 — its agentic features have grown, but it is still primarily a completion-plus-light-agent product and lives inside many editors rather than defining one. Cody (from Sourcegraph) and Aider (open source CLI agent) are credible and worth evaluating, but the three tools in this piece have the largest pull in current procurement conversations, and the architectural patterns generalise.
How do I handle source-code privacy and data residency?
All three vendors offer enterprise tiers with stronger guarantees: no training on customer code, configurable retention, region pinning, and SSO. Read the actual contract; do not rely on marketing pages. For air-gapped environments, all three are constrained because the model itself needs to be served from somewhere. Claude Code via a private Anthropic endpoint, Windsurf on a customer-controlled deployment, and Cursor’s enterprise self-hosted options are the closest each gets, with different trade-offs.
What is the realistic productivity uplift?
We do not publish a number because honest answers vary enormously by codebase, team, language, and the maturity of your context configuration. Vendor-published numbers tend to come from controlled studies on specific tasks; we treat them as illustrative rather than predictive. In practical adoptions, teams report meaningful uplift on multi-file refactors, test generation, boilerplate, and well-scoped feature work, and much smaller (sometimes negative) uplift on architectural design, complex debugging, and unfamiliar large codebases. Measure on your own work.
Can these tools open pull requests on their own?
Cursor Background Agents and BugBot are explicitly PR-aware. Windsurf supports it but the flow is more developer-driven. Claude Code does it via shell tools or a Git MCP server — you wire it however you want. For any unattended PR creation, lock down hook policy (or equivalent product controls), require human review before merge, and never let the agent push to default branches without an explicit, audited approval step.
Should we standardise on one tool company-wide?
Probably not, if you are bigger than a small team. Different roles benefit from different tools — editor-bound contributors from Cursor or Windsurf, platform and automation teams from Claude Code. Standardising on one tool simplifies procurement and training, but you usually pay for that simplicity by under-serving one of the two camps. The middle path is to bless a primary editor agent and a primary automation agent, document when each is appropriate, and let teams use them together.
Further reading
- Anthropic, Claude Code documentation, https://docs.anthropic.com/claude/claude-code — official reference for CLI, hooks, MCP, sub-agents, and headless mode.
- Anthropic, Model Context Protocol specification, https://modelcontextprotocol.io — the open protocol underlying Claude Code’s tool use and increasingly adopted across the agent ecosystem.
- Cursor, Cursor documentation, https://docs.cursor.com — vendor docs for Composer, Agent mode, Background Agents, BugBot, and
/rules. - Windsurf, Windsurf documentation and Wave release notes, https://docs.windsurf.com — vendor reference for Cascade, Flow, and the Wave release cadence.
- GitHub, Copilot agents and review, https://docs.github.com/en/copilot — for comparison against the completion-plus-agent baseline.
- Sourcegraph, Cody enterprise and agentic features, https://sourcegraph.com/docs/cody — alternative enterprise-leaning agent IDE.
- Aider, Aider open-source CLI agent, https://aider.chat — open-source CLI reference point if you want to compare against Claude Code.
- Simon Willison, Notes on AI coding agents and prompt injection, https://simonwillison.net — ongoing practitioner commentary on agent security.
- NIST, AI Risk Management Framework, https://www.nist.gov/itl/ai-risk-management-framework — useful structural reference for building governance around agentic tools.
- OWASP, Top 10 for LLM Applications, https://owasp.org/www-project-top-10-for-large-language-model-applications/ — known-unsafe patterns to design hooks and policies against.
