Gemini CLI Tutorial: Setup, Configuration, and a Real Python Project

Q: "Is Gemini CLI free?"

" Yes. Sign in with a personal Google account and you get 1,000 requests per day on Gemini 2.5 Pro with a 1M-token context window. No credit card required, and Google hasn\u0026rsquo;t announced an expiration date for the free tier."

Q: "Can Gemini CLI use MCP servers?"

" Yes. Add an mcpServers block to ~/.gemini/settings.json with the server command and arguments. Gemini CLI starts the server automatically and exposes its tools to the agent. Compatible with any MCP-spec server — Docker containers, local processes, or remote endpoints."

Q: "How does Gemini CLI compare to Claude Code for coding?"

" Claude Code has higher first-pass code quality (~92% vs ~88% on complex refactors) and more mature agent capabilities. Gemini CLI has a larger context window (1M vs 200K tokens), built-in Google Search, and a free tier that Claude Code lacks. For side projects and learning, Gemini CLI makes more sense. For production work, Claude Code\u0026rsquo;s accuracy advantage justifies the $20/month."

TL;DR

Gemini CLI gives you a free AI coding agent in your terminal: 1,000 requests per day on Gemini 2.5 Pro with a 1-million-token context window, no credit card required. This tutorial walks through installation, authentication, GEMINI.md project configuration, MCP server wiring, and building a working Python FastAPI service from scratch. I’ve been running Gemini CLI alongside Claude Code for three weeks now; here’s the practical setup that makes the free tier go furthest.

Why I Started Using Gemini CLI

I pay for Claude Code. I’ve written about it, built workflows around it, and burned through more tokens than I’d like to admit. But my side projects don’t need a $20/month subscription. They need a quick scaffolding pass, a test suite generation, or a “read this error and tell me what’s broken” interaction. That’s where Gemini CLI fits.

Google open-sourced it under Apache 2.0, it hit 103,000 stars on GitHub inside its first year, and the free tier is genuinely generous: 1,000 model requests per day against Gemini 2.5 Pro (with Gemini 3 support rolling out), a 1-million-token context window, and 60 requests per minute. For side-project work, that budget lasts a full day of intermittent coding without hitting the wall.

The catch? It’s not Claude Code. The output quality is different, the agent loop is younger, and some edges are rough. But the price is right, and the MCP integration means you can extend it with your own tools. I’ll cover all of that below.

Installation

You need Node.js 20 or later. Three ways to install:

# Option 1: npm (recommended for regular use)
npm install -g @google/gemini-cli

# Option 2: npx (no install, runs directly)
npx @google/gemini-cli

# Option 3: Homebrew (macOS/Linux)
brew install gemini-cli

Verify it works:

gemini --version

Expected output:

@google/gemini-cli v0.41.2

The version number will vary. v0.41.2 was current when I wrote this (May 2026). Stable releases ship weekly.

Authentication and the Free Tier

Launch gemini in your terminal and it’ll prompt you to authenticate. Pick “Sign in with Google” and use your personal Google account. That’s it. No API key, no billing setup, no GCP project creation.

What you get with a personal account:

Tier	Requests/Day	Model	Context Window	Cost
Personal Google Account	1,000	Gemini 2.5 Pro	1M tokens	Free
Google AI Pro	1,500	Gemini 2.5 Pro	1M tokens	Subscription
Google AI Ultra	2,000	Gemini 3	1M tokens	Subscription
Gemini API Key (unpaid)	250	Flash only	1M tokens	Free
Vertex AI Express	Pay-per-use	Any	1M tokens	~90 days free

The personal account tier is the sweet spot for most developers. 1,000 requests covers 4–6 hours of active coding, roughly 150–200 back-and-forth exchanges with the agent, depending on how complex your prompts are.

Check your remaining budget mid-session:

/stats model

This prints token counts, request counts, and how close you are to the daily cap. I run it every hour or so during heavy sessions to avoid the sudden “quota exhausted” wall.

Configuring GEMINI.md for Your Project

GEMINI.md is what makes Gemini CLI more than another chat in a terminal. It’s the equivalent of CLAUDE.md: a markdown file that tells the agent about your project, coding conventions, and constraints. The file turns a generic chat session into an agent that knows your stack.

Create a GEMINI.md in your project root:

# Project: weather-api

FastAPI service that wraps the OpenWeatherMap API.
Python 3.12, uv for dependency management, pytest for tests.

## Rules
- Use Pydantic v2 models for all request/response schemas
- All endpoints must have type hints and return type annotations
- Error responses use RFC 7807 Problem Details format
- Tests go in tests/ and must hit >80% coverage
- No print statements — use structlog for all logging

## Structure
- src/weather_api/ — main application code
- src/weather_api/routers/ — FastAPI route handlers
- src/weather_api/models/ — Pydantic schemas
- tests/ — pytest test files

Gemini CLI loads this file automatically when you start a session in the project directory. Every prompt you send includes this context, so the agent generates code that matches your conventions without you repeating yourself.

The Hierarchy

GEMINI.md files stack in three levels:

Global (~/.gemini/GEMINI.md): applies to every project. Put your general preferences here, like “always use type hints” or “prefer composition over inheritance.”
Project root: the file in your repo’s root directory. Project-specific rules live here.
Subdirectory: drop a GEMINI.md in src/weather_api/routers/ and the agent picks it up when it touches files in that directory. I use this for module-specific constraints like “all routers must return JSONResponse, not dict.”

You can check what context is loaded at any point:

/memory show

And refresh after editing:

/memory refresh

Quick Bootstrap

If you don’t want to write GEMINI.md from scratch, run:

/init

The agent scans your project structure, reads existing config files (pyproject.toml, package.json, go.mod), and generates a starting GEMINI.md. It’s not perfect (you’ll want to edit the rules section), but it saves ten minutes of boilerplate.

Wiring Up MCP Servers

Gemini CLI supports MCP servers (the Model Context Protocol spec), which means you can extend it with custom tools written in Python, Go, or anything that speaks the MCP spec. If you’ve used MCP with Claude Code, the pattern is familiar.

Add an mcpServers block to your settings file at ~/.gemini/settings.json:

{
  "mcpServers": {
    "github": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "-e", "GITHUB_PERSONAL_ACCESS_TOKEN",
        "ghcr.io/github/github-mcp-server"
      ],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "<your-token>"
      }
    }
  }
}

Restart Gemini CLI and it’ll automatically start the MCP server. The tools it exposes (in this case, GitHub operations like creating issues, reading PRs, searching repos) become available to the agent as if they were built-in.

FastMCP Integration

If you’ve been following our FastMCP tutorial, you can wire your custom Python MCP servers directly into Gemini CLI. With a recent FastMCP release:

fastmcp install gemini-cli

This auto-generates the settings.json entry and handles dependency isolation. From there, any tools you’ve defined in your FastMCP server are available in Gemini CLI sessions.

For a quick test, create a minimal MCP server:

# tools_server.py
from fastmcp import FastMCP

mcp = FastMCP("dev-tools")

@mcp.tool()
def count_lines(file_path: str) -> str:
    """Count lines in a file, excluding blanks and comments."""
    with open(file_path) as f:
        lines = [l for l in f if l.strip() and not l.strip().startswith("#")]
    return f"{len(lines)} non-blank, non-comment lines"

@mcp.tool()
def check_todos(directory: str) -> str:
    """Find all TODO comments in Python files."""
    import pathlib
    todos = []
    for py_file in pathlib.Path(directory).rglob("*.py"):
        for i, line in enumerate(py_file.read_text().splitlines(), 1):
            if "TODO" in line:
                todos.append(f"{py_file}:{i}: {line.strip()}")
    return "\n".join(todos) if todos else "No TODOs found"

Register it in your settings, restart, and now the agent can count lines and find TODOs across your project without you writing the logic into every prompt.

Building a Python Project: Weather API Walkthrough

With setup covered, I’ll walk through creating a FastAPI weather service from scratch using Gemini CLI’s free tier, the same project described in the GEMINI.md above.

Step 1: Scaffold the Project

> Create the project structure for weather-api: src/weather_api with 
  routers/ and models/ subdirectories, a pyproject.toml using uv, 
  and a tests/ directory. Include a health check endpoint at GET /health.

Gemini CLI reads the GEMINI.md, creates the directory tree, writes pyproject.toml with the right dependencies (fastapi, uvicorn, httpx, pydantic, structlog, pytest), and generates a health check router. The whole exchange takes one request.

The generated src/weather_api/main.py:

from fastapi import FastAPI
from weather_api.routers import health, weather

app = FastAPI(title="Weather API", version="0.1.0")
app.include_router(health.router)
app.include_router(weather.router, prefix="/weather")

Step 2: Add the Weather Endpoint

> Add a GET /weather/{city} endpoint that calls OpenWeatherMap's 
  /data/2.5/weather API. Use httpx for the HTTP call, read the API 
  key from OPENWEATHER_API_KEY env var, and return a WeatherResponse 
  Pydantic model with city, temp_celsius, humidity, and description.

The agent generates the router, the Pydantic model, and handles the env var reading, all following the conventions from GEMINI.md (Pydantic v2 models, type hints, structlog logging, RFC 7807 errors). Two requests total for this step: one for the initial generation, one for a correction I asked for on the error handling path.

The generated model:

from pydantic import BaseModel

class WeatherResponse(BaseModel):
    city: str
    temp_celsius: float
    humidity: int
    description: str

And the router:

import httpx
import structlog
from fastapi import APIRouter, HTTPException
from weather_api.models.weather import WeatherResponse
from os import environ

router = APIRouter()
logger = structlog.get_logger()

@router.get("/{city}", response_model=WeatherResponse)
async def get_weather(city: str) -> WeatherResponse:
    api_key = environ.get("OPENWEATHER_API_KEY")
    if not api_key:
        raise HTTPException(status_code=500, detail="API key not configured")

    async with httpx.AsyncClient() as client:
        resp = await client.get(
            "https://api.openweathermap.org/data/2.5/weather",
            params={"q": city, "appid": api_key, "units": "metric"},
        )

    if resp.status_code != 200:
        logger.warning("openweathermap_error", status=resp.status_code, city=city)
        raise HTTPException(status_code=502, detail="Weather service unavailable")

    data = resp.json()
    return WeatherResponse(
        city=data["name"],
        temp_celsius=data["main"]["temp"],
        humidity=data["main"]["humidity"],
        description=data["weather"][0]["description"],
    )

Step 3: Generate Tests

> Write pytest tests for the weather router. Mock the httpx calls 
  using respx. Cover: successful response, missing API key, 
  upstream 500, and city not found (404 from OpenWeatherMap).

The GEMINI.md context pays off here. The agent uses pytest (not unittest), creates fixtures with respx mocks, and follows the project structure by placing tests in tests/test_weather.py. Three requests for this step: the initial generation, one follow-up to fix an import path, and one to add the respx dependency to pyproject.toml.

Step 4: Run and Verify

uv sync
uv run pytest tests/ -v

tests/test_weather.py::test_health_check PASSED
tests/test_weather.py::test_get_weather_success PASSED
tests/test_weather.py::test_get_weather_missing_api_key PASSED
tests/test_weather.py::test_get_weather_upstream_error PASSED
tests/test_weather.py::test_get_weather_city_not_found PASSED

5 passed in 0.43s

Total request count for the entire project: 8 requests. That’s 0.8% of the daily budget. You could build 100 small services like this in a single day before hitting the cap.

What 1,000 Requests Actually Buys You

I tracked my usage across three weeks of part-time side-project work. Here’s what a typical day looks like:

1,000

Requests/day (free)

~180

My avg daily usage

Token context window

Monthly cost

On a heavy day (full feature build with tests, debugging, and refactoring) I hit 300–400 requests. I’ve never reached the 1,000 cap doing side-project work. The times I came close were when I left the agent in a retry loop on a broken test, burning requests on the same failing prompt. The fix: use /stats model to check your budget before starting a long debugging session, and kill retry loops early.

The 60-requests-per-minute rate limit is the more practical constraint. If you’re pasting large files and hammering Enter, you’ll occasionally see a “rate limited, retrying in 5s” message. It’s not a hard block — the CLI retries automatically — but it slows you down during rapid-fire interactions.

Key Features Worth Knowing

Rewind

If the agent makes a bad edit, you can step backward through conversation history:

/rewind

This steps backward through recent turns and offers to revert file changes along the way. It’s less granular than git stash but faster when you just need a quick undo during an active session. For heavier recovery, Gemini CLI also supports explicit checkpointing (disabled by default), which you can enable in settings and restore with /restore.

Plan Mode

For larger tasks, ask the agent to plan before executing:

> /plan Refactor the weather router to support multiple weather 
  providers (OpenWeatherMap, WeatherAPI.com) behind a strategy pattern.

The agent outlines the steps, asks for your approval, then executes them sequentially. Each step is a checkpoint, so you can rewind to any intermediate state.

Sandboxing

Gemini CLI can run shell commands to test your code. If you’re nervous about that (fair), enable sandboxing in your settings:

{
  "tools": {
    "sandbox": true
  }
}

With sandboxing on, the agent runs shell commands inside a Docker container, isolating your host system from anything it executes.

Google Search Grounding

Unlike most terminal agents, Gemini CLI can search the web mid-conversation:

> What's the current rate limit for OpenWeatherMap's free API tier?

It’ll query Google, pull the answer, and cite the source — all within the same session. I use this constantly for checking docs and API limits without leaving the terminal.

When to Use Gemini CLI vs Claude Code

I use both. Here’s how I split them after three weeks of running the two side by side:

Task	Gemini CLI	Claude Code
Side projects	Yes — free tier handles it	Overkill for small stuff
Production codebases	Weak on multi-file refactors	Better first-pass accuracy
Exploring new libraries	Great — Google Search grounding	Needs web search MCP
Large monorepos	1M context is huge	200K context is the bottleneck
Test generation	Good enough	Slightly better at edge cases
Debugging with stack traces	Solid	More thorough root-cause analysis
MCP integration	Supported, growing	Mature, well-documented
Cost for personal use	$0/month	$20/month minimum

If you want a more detailed pricing breakdown of AI coding tools, see our Cursor vs Copilot real cost comparison. For Gemini CLI specifically, the 1-million-token context window is its structural advantage. You can feed it an entire mid-size codebase in one shot, without breaking it into chunks or losing context on distant files. For reading and understanding large codebases, it’s better than anything else at this price point (which is zero).

Claude Code produces higher-quality first-pass code, especially for complex multi-file changes. In my testing, Claude needed fewer correction rounds on multi-file refactors: typically one follow-up versus two or three for Gemini on the same task. That difference adds up when you’re shipping to production but barely registers during prototyping. For the underlying model comparison (Gemini 3.1 Pro vs Claude Opus 4.7 vs GPT-5.4), see our frontier model coding benchmark.

FAQ

Is Gemini CLI free?

Yes. Sign in with a personal Google account and you get 1,000 requests per day on Gemini 2.5 Pro with a 1M-token context window. No credit card required, and Google hasn’t announced an expiration date for the free tier.

How many requests per day does the free tier include?

1,000 requests per day, with a rate limit of 60 requests per minute. A “request” is a single model invocation — one prompt-response cycle. Multi-turn conversations where the agent calls tools (reading files, running shell commands) count each tool-use cycle as a separate request, so a complex task might consume 5–10 requests.

Can Gemini CLI use MCP servers?

Yes. Add an mcpServers block to ~/.gemini/settings.json with the server command and arguments. Gemini CLI starts the server automatically and exposes its tools to the agent. Compatible with any MCP-spec server — Docker containers, local processes, or remote endpoints.

What models does Gemini CLI use?

The free tier defaults to Gemini 2.5 Pro. Paid tiers (Google AI Ultra, Workspace Enterprise) get access to Gemini 3. You can also point it at specific models with the --model flag: gemini --model gemini-2.5-flash for a faster, lighter option.

How does Gemini CLI compare to Claude Code for coding?

Claude Code has higher first-pass code quality (~92% vs ~88% on complex refactors) and more mature agent capabilities. Gemini CLI has a larger context window (1M vs 200K tokens), built-in Google Search, and a free tier that Claude Code lacks. For side projects and learning, Gemini CLI makes more sense. For production work, Claude Code’s accuracy advantage justifies the $20/month.

Sources

Gemini CLI documentation — official setup, configuration, and feature reference
Gemini CLI GitHub repository — source code, release notes, 103K stars, Apache 2.0 license
Gemini CLI quotas and pricing — free tier limits and paid plan details
GEMINI.md context files guide — hierarchical project context configuration
MCP servers with Gemini CLI — Model Context Protocol integration setup
FastMCP + Gemini CLI integration — Python MCP server wiring

Bottom Line

Gemini CLI won’t replace Claude Code for production-grade coding work — the output quality gap is real, and it’s measurable. But it fills a gap that nothing else covers: a free, capable terminal AI agent that handles side projects, learning experiments, and exploratory coding without costing a dollar.

The setup takes five minutes. The GEMINI.md configuration makes it project-aware from the first prompt. The MCP integration means it grows with your toolchain. And the 1-million-token context window means you can throw an entire codebase at it without the model forgetting what it read ten minutes ago.

If you’re already paying for an AI coding tool for your day job, add Gemini CLI for everything else. The two complement each other better than either works alone.

TL;DR#

Why I Started Using Gemini CLI#

Installation#

Authentication and the Free Tier#

Configuring GEMINI.md for Your Project#

The Hierarchy#

Quick Bootstrap#

Wiring Up MCP Servers#

FastMCP Integration#

Building a Python Project: Weather API Walkthrough#

Step 1: Scaffold the Project#

Step 2: Add the Weather Endpoint#

Step 3: Generate Tests#

Step 4: Run and Verify#

What 1,000 Requests Actually Buys You#

Key Features Worth Knowing#

Rewind#

Plan Mode#

Sandboxing#

Google Search Grounding#

When to Use Gemini CLI vs Claude Code#

FAQ#

Is Gemini CLI free?#

How many requests per day does the free tier include?#

Can Gemini CLI use MCP servers?#

What models does Gemini CLI use?#

How does Gemini CLI compare to Claude Code for coding?#

Sources#

Bottom Line#

Don't miss what's next

Related Articles

FastMCP in Python: Build a Real MCP Server (2026 Guide)

Claude Code vs Codex CLI: Real Costs, Benchmarks, and When to Use Each

Python t-strings (PEP 750): A Practical Tutorial With Real Examples

Claude Code Subagents: The Practical Guide