TL;DR

Google Jules is the only major coding agent built around queuing instead of live chat. You describe a task, walk away, and a pull request shows up later. The free tier gives 15 tasks per day on Gemini 3 Flash. The $19.99/month Pro tier bumps that to 100 tasks on Gemini 3.1 Pro, and proactive features like CI Fixer and Scheduled Tasks make it feel less like a tool and more like a junior developer who never goes offline. But Jules is slow, can’t handle files over ~50K lines, and only connects to GitHub. If you need real-time pair programming or work with GitLab, look elsewhere.

Why I Tried Jules

I’ve been using Claude Code and Codex CLI for months — both are real-time terminal agents where you type a prompt and watch code materialize. They’re good at that. But I kept running into the same friction: I’d queue up three refactoring tasks in my head, then sit there babysitting the agent through each one sequentially. Context switching between “architect mode” and “watch the agent type” mode was costing me actual productive hours.

Jules promised something different. Describe the task, hit submit, go do something else. Come back to a pull request. I signed up for the Pro tier ($19.99/month bundled with Google AI Pro) and spent three weeks throwing real work at it — dependency bumps, test scaffolding, bug fixes across a Flask API and two Go microservices.

The short version: Jules delivered on the async promise. But “async” also means “slow,” and the tradeoffs stack up in ways the marketing doesn’t mention.

How Jules Works

Every task runs in an isolated Google Cloud VM. Jules clones your repo, reads the codebase, builds an execution plan, and shows you that plan before touching any files. You can edit the plan, approve it, or scrap it entirely. Once approved, Jules works through the changes file by file, running any tests it finds at each step. When it’s done, it opens a PR on GitHub.

The whole loop is: submit, approve a plan, wait for the PR notification. No terminal session, no streaming output, no watching characters appear.

15–300
Tasks/day by tier
3–60
Concurrent tasks
80.6%
SWE-bench (Gemini 3.1 Pro)

The model underneath depends on your tier. Free gets Gemini 3 Flash. Pro and Ultra run Gemini 3.1 Pro, which scores 80.6% on SWE-bench Verified — competitive with Claude Opus 4.6 at 80.8%, though behind Opus 4.7’s 87.6% in agentic scaffolding. (For a full breakdown of how these models compare on coding tasks, see the GPT-5.4 vs Claude Opus 4.7 vs Gemini 3.1 Pro comparison.)

Pricing Breakdown

Jules doesn’t have its own subscription. It bundles into Google’s AI tiers:

FreePro ($19.99/mo)Ultra ($99.99/mo)
Daily tasks15100300
Concurrent tasks31560
ModelGemini 3 FlashGemini 3.1 ProGemini 3.1 Pro (priority)
Suggested TasksNoYesYes
Scheduled TasksYesYesYes

The free tier is generous enough for evaluation. Fifteen tasks per day covers most solo developers who want to offload grunt work. Pro makes sense once you’re running 10+ tasks daily and want the model upgrade. Ultra is for teams running agent-heavy workflows — 60 concurrent tasks means you can point Jules at an entire sprint backlog and let it churn.

One catch: paid plans require a @gmail.com account. Google Workspace users can’t subscribe yet.

What Jules Got Right

Batch Parallelism

The async model isn’t just a UX gimmick. I’d queue five dependency-bump tasks at 9 AM, go write the design doc I’d been avoiding, and come back to five PRs by 10:30. With Claude Code, those same five tasks would take me through lunch because I’d be approving file edits and answering clarification prompts one by one.

On Pro, 15 concurrent slots mean you can throw an entire backlog at Jules without hitting a queue. I ran 12 tasks simultaneously during a sprint cleanup, and all 12 completed within 90 minutes. Doing that sequentially in Claude Code would have taken most of an afternoon.

CI Fixer

This was the feature I didn’t expect to love. When a GitHub Actions workflow fails, Jules automatically analyzes the logs, writes a fix, commits it, and resubmits. It loops until CI passes or gives up after a configurable number of attempts.

I had a Flask test suite that broke after a SQLAlchemy upgrade. Three tests failing on a deprecated session API. I pointed Jules at the CI failure. It read the logs, traced the issue to session.close() being called after the session was already garbage-collected, replaced it with a scoped session factory, and pushed a green build. Took about eight minutes. I would have spent 20 debugging that myself because I always forget the scoped session pattern.

Scheduled Tasks

You can set Jules to run recurring jobs: nightly lint passes, weekly dependency audits, monthly dead-code sweeps. This is the part that makes Jules feel like a team member rather than a tool. I set up a weekly pip-audit run on my Flask API — every Monday morning, a PR shows up with any new CVEs patched. Before Jules, I’d check this maybe once a quarter.

Suggested Tasks

On Pro and Ultra, Jules scans up to five repos and proposes improvements. It started with TODO comments — finding forgotten # TODO: handle edge case annotations scattered through my code and opening PRs to actually handle them. Over two weeks, it cleared 14 TODOs I’d written months ago and forgotten about.

The suggestions aren’t always useful. Jules proposed refactoring a perfectly fine utility function into a class hierarchy that added complexity for zero benefit. But the hit rate was around 60-70%, and dismissing bad suggestions takes seconds.

Where Jules Falls Short

Speed

Jules is slow. A task that Claude Code handles in 90 seconds takes Jules 8-15 minutes. Part of this is the VM spin-up, part is the planning phase (Jules builds a detailed plan before writing any code), and part is that Gemini 3.1 Pro generates tokens slower than Claude in agentic loops.

For anything urgent (a production bug, a quick fix before a demo) Jules isn’t the right tool. You’ll be staring at a progress bar while Claude Code would have already pushed the commit.

Large File Blindness

Gemini 3.1 Pro has a 1M-token context window, but Jules appears to impose a tighter limit in practice. Large files are off-limits. I hit this on a legacy Go service with a 12,000-line handlers.go monolith (not proud of that file, but it exists). Jules’s plan referenced functions that didn’t exist in the file — it was working with a truncated view.

Real-time agents handle this differently. Claude Code can stream file reads and focus on specific sections. Jules loads the whole context upfront and chokes on anything too large.

GitHub Only

No GitLab. No Bitbucket. No self-hosted Git. If your repos aren’t on github.com, Jules can’t touch them. Google Workspace integration is also missing, which means enterprise teams on Google Cloud who use Cloud Source Repositores are locked out too.

Language Coverage

Python and TypeScript/JavaScript are first-class citizens. Jules writes solid code in both, catches edge cases, and uses idiomatic patterns. Go, Java, and C# work but with noticeably lower reliability. My Go microservices got PRs that compiled but missed patterns any Go developer would catch: unchecked errors, bare returns where wrapped errors belong.

Hallucinated Progress

Twice during my testing, Jules claimed a task was complete when it had actually stalled mid-execution. The PR showed up with partial changes: half the files edited, tests not run. There’s no clear indication in the UI when this happens. You find out during code review, which defeats the “queue and forget” promise. If you’re relying on any coding agent for unsupervised work, setting up guardrails before you go hands-off is worth the time.

Jules vs the Competition

FeatureGoogle JulesClaude CodeGitHub Copilot AgentOpenAI Codex
Interaction modelAsync (queue + PR)Real-time terminalBoth (IDE + async)Async (cloud tasks)
Pricing$0–99.99/mo$20/mo (Pro) or API$10–39/moAPI-based
ModelGemini 3.1 ProClaude Opus 4.7GPT-5.3-Codex (default)GPT-5.3-Codex
SWE-bench80.6%87.6%~77–80%85%
Concurrent tasks3–601 (serial)1–3Varies
Proactive featuresCI Fixer, ScheduledNoneLimitedNone
Git platformsGitHub onlyAnyGitHub onlyGitHub only
Best forBatch work, maintenanceComplex refactors, explorationGitHub-native workflowsAutomated fixes

What counts here is workflow fit, not a feature checklist.

Jules owns the batch maintenance lane. Queue 20 dependency bumps and lint fixes, check the PRs over coffee. On Pro with 15 concurrent slots, a full day’s grunt work finishes before lunch. No other agent handles this volume as smoothly.

Claude Code is the better pick for anything that needs back-and-forth. Debugging a race condition, designing an API, exploring unfamiliar code — you want a real-time thinking partner, and Opus 4.7’s 7-point SWE-bench lead over Gemini 3.1 Pro shows up when the task gets hard. (I covered the DeepSeek V4 Pro review recently, and it’s another strong option at a fraction of Claude’s API cost.)

Copilot Agent fits if you already live in GitHub Issues and Actions. It’s the least friction for teams whose entire workflow is PR-centric.

Where Jules pulls ahead of all three: proactive features. I haven’t found CI auto-fixing or scheduled recurring tasks in any competing agent. That gap alone kept me on the Pro tier.

MCP Server Integration

In February 2026, Jules added Model Context Protocol support with six hand-selected servers: Linear, Stitch, Neon, Tinybird, Context7, and Supabase. Google took a curated approach: every server was audited for data flow and tool permissions before being allowed.

In practice, this means Jules can read your Linear tickets, query your Neon database schema, and check Supabase auth configuration while planning changes. I connected the Neon MCP server and gave Jules a task: “add pagination to the /users endpoint based on the current schema.” It pulled the schema directly from Neon, wrote the SQL migration and the Python endpoint code, and got it right on the first try. Without MCP, I’d have had to paste the schema into the task description.

Six servers is limiting. Claude Code connects to any MCP server you configure. But Google’s curated approach makes sense for an agent that runs in a cloud VM with repo access. A malicious MCP server could exfiltrate code, so restriction buys you something real.

The Jules API

Google also launched a Jules API for programmatic task creation. You can trigger Jules tasks from CI pipelines, chatbots, or custom tooling. The API exposes task creation, status polling, and result retrieval.

The API is still in v1alpha, so field names and auth methods may change. Here’s the general shape of a session-creation call using the current schema:

import requests

API_KEY = "your-google-api-key"

session = requests.post(
    "https://jules.googleapis.com/v1alpha/sessions",
    headers={"X-Goog-Api-Key": API_KEY},
    json={
        "sourceContext": {
            "gitHub": {"repository": "owner/repo", "branch": "main"}
        },
        "title": "Add input validation to /users POST endpoint",
    },
)
print(session.json())
# {"name": "sessions/abc123", "state": "CREATED", ...}

The automationMode field controls whether Jules runs without human review of its execution plan. I keep it at the default (manual approval) because I want to see the plan before Jules starts editing files. For trusted, repeatable tasks like dependency bumps, switching to full automation turns Jules into an autonomous pipeline.

The obvious next step is connecting Jules to your issue tracker: new bug filed, Jules automatically attempts a fix, PR shows up for review. The Stitch design team at Google reportedly runs “a pod of daily Jules agents” with assigned roles (performance tuning, security patching, accessibility, test coverage), making Jules, according to the team’s blog post, one of the largest contributors to their repository.

Project Jitro: What’s Coming Next

Google previewed Project Jitro at I/O 2026 — the next version of Jules that shifts from task-driven to goal-driven. Instead of “fix this function,” you’d say “get test coverage to 85%” and Jitro figures out which files to change, which tests to write, and how to get the metric where you want it.

The current Jules already hints at this direction. Suggested Tasks, Scheduled Tasks, and the Render integration all share one pattern: Jules initiating action based on codebase state. Jitro takes that to its logical conclusion.

The obvious question is accountability. When an agent autonomously refactors modules to hit a metric, who reviews the architectural decisions it made along the way? Google hasn’t answered that yet. Jitro launched under a waitlist at I/O, so general availability is probably months away.

Who Should Use Jules

Good fit:

  • You maintain multiple repos and spend hours weekly on dependency updates, lint fixes, and test scaffolding
  • You want CI failures fixed automatically without context-switching from whatever you’re building
  • You work in Python or TypeScript and your repos are on GitHub
  • You like reviewing PRs more than supervising an agent in real time

Skip it:

  • You need real-time collaboration — architecture discussions, exploratory coding, debugging complex state
  • Your repos are on GitLab, Bitbucket, or self-hosted Git
  • You work primarily in Go, Java, or C# where Jules’s output needs heavy review anyway
  • You need to work with files over 50K lines

FAQ

Is Google Jules free?

Yes, the free tier gives 15 tasks per day with 3 concurrent slots, running on Gemini 3 Flash. No credit card required. It’s enough to evaluate whether the async model fits your workflow before committing to Pro.

How does Google Jules compare to Claude Code?

They solve different problems. Jules is async — you queue tasks and get PRs back later. Claude Code is real-time — you work together in a terminal session. Jules is better for batch maintenance work across multiple repos. Claude Code is better for complex single-task work where you need back-and-forth. Claude’s underlying model (Opus 4.7, 87.6% SWE-bench) also outperforms Jules’s Gemini 3.1 Pro (80.6%) on coding benchmarks.

What languages does Google Jules support?

Python and TypeScript/JavaScript are best supported. Go, Java, and C# work but produce less reliable output. Expect to catch missed error handling patterns and non-idiomatic code during review.

Can Jules work with private repositories?

Yes. Jules clones repos into isolated Google Cloud VMs. Google states your code isn’t used for model training. The VM is ephemeral — spun up per task and destroyed after.

What is Project Jitro?

Project Jitro is Google’s next-generation coding agent, previewed at I/O 2026. Instead of describing a task (“fix this bug”), you define a goal (“reduce p95 latency by 30ms”) and the agent determines the changes needed. It’s on a waitlist — no general availability date yet.

Sources

Bottom Line

Jules is the best coding agent for people who hate babysitting coding agents. The async model, CI Fixer, and Scheduled Tasks create a workflow where maintenance work runs on autopilot. Monday mornings, I’d wake up to 3-4 PRs from overnight pip-audit and lint runs. For $19.99/month, that trade works.

For thinking-partner work (debugging a race condition, designing an API, exploring unfamiliar code) you still need Claude Code or Copilot. Jules takes orders and delivers results, on its own schedule, at its own pace.

If your bottleneck is “too many small tasks, not enough hands,” try the free tier for a week. Queue up your backlog. See what comes back. The 15-task daily limit is enough to know whether this fits your workflow.