agent.go
Case 01 · Coding assistant Single-loop · file system + shell

A coding assistant that lives in your terminal.

A CLI that reads your repo, edits files, runs tests, and reports back. The minimum-viable shape of this is roughly 280 lines — and that minimum is a serious engineering tool. Claude Code, Aider, and a half-dozen internal tools at Go shops all share this skeleton.

The shape of the system

  • Single agent loop, max 16 turns. No multi-agent orchestration. The model is smart enough; the tools are right.
  • Four toolsread_file, write_file, bash, run_tests. Adding more tools usually hurts; the model gets confused about which to pick.
  • Sandboxed working directory. The bash tool's CWD is rooted to the project; environment is scrubbed of secrets unrelated to the project.
  • Diff persistence. Every write_file tool call is captured as a unified diff and shown to the user before applying.
  • Bring-your-own-model. The same loop runs against Claude Sonnet 4.6 (default), GPT-5, or a local Qwen2.5-Coder 32B over Ollama.

Where Go pulls its weight

  • os/exec + context.WithTimeout for a sandboxed shell tool with deadline propagation that just works.
  • go test ./... output parses cleanly because Go's tooling has structured output (-json) — no string matching against Python pytest ANSI.
  • Single static binary — drop the agent binary into a Docker container, mount the repo, run.
cmd/code/main.go
func main() {
    root, _ := os.Getwd()
    reg   := agent.New()

    agent.Register(reg, "read_file",
        "Read a file from the project.",
        tools.ReadFile(root))

    agent.Register(reg, "write_file",
        "Atomically replace or create a file.",
        tools.WriteFile(root))

    agent.Register(reg, "bash",
        "Run a bash command in the project sandbox.",
        tools.Bash(root))

    agent.Register(reg, "run_tests",
        "Run go test on a package and return the parsed result.",
        tools.RunTests(root))

    a := &agent.Agent{
        Model:        claude("sonnet-4-6"),
        SystemPrompt: codingPrompt,
        Tools:        reg,
        MaxTurns:     16,
    }

    goal := strings.Join(os.Args[1:], " ")
    out, err := a.Run(context.Background(), goal)
    if err != nil { log.Fatal(err) }
    fmt.Println(out)
}

The signal-to-noise of "tools that change the file system" + "tools that observe the file system" + "tools that run the tests" is unbeatable. Everything else is a worse interface for the model.

Case 02 · Research agent Multi-agent · orchestrator + workers · ~340 LOC

A research team that collapses an afternoon of browser tabs into a paragraph.

A user asks a question — "what are the top three Go libraries for Postgres connection pooling, and how have they evolved in 2025?" The orchestrator decomposes it into search queries, dispatches three workers in parallel, gathers their findings, and hands them to a writer agent to synthesize.

Workers with narrow tool surfaces

  • researcherweb_search, fetch_url. Reads a topic, returns a summary.
  • code_analystgithub_search, read_repo_file. Reads source code, summarizes architecture.
  • writer — no tools. Pure synthesis. Takes worker outputs, returns a structured report.

Why the orchestrator's context stays small

The orchestrator never sees a worker's intermediate steps — only the worker's final output. A worker might run 8 turns and burn 30k tokens of search results; the orchestrator sees a 200-word summary back. This is what "context isolation" buys you.

Tradeoffs vs. a single-loop research agent
DimensionSingle-loopOrchestrator + workers
Wall timeSequential — slowestParallel — 3-4× faster on multi-source queries
Token costCheaper for short tasks~1.4× for short tasks; ~3× for long ones (worker + orch overhead)
QualityLoses focus past ~10 turnsEach worker is fresh and focused
DebuggabilityOne trace to readOne trace per worker — be deliberate about IDs
Best forSingle-source questions"Compare X across N sources"
Case 03 · Customer-support resolver Routing → RAG → action tools

The shape that actually deflects tickets — without making customers angry.

Most "customer support agents" disappoint because they treat every message like the same kind of problem. The shape that works in production is a routed system: classify, retrieve, then either answer informationally or invoke a real action tool (refund, password reset, ticket creation) — with hard pre-checks before any side effects.

The full pipeline, in one diagram

1
Classify intent

Haiku 4.5 — billing | technical | sales | other

2
Retrieve context

pgvector top-K from KB + last 10 user tickets

3
Resolve

Sonnet 4.6 with bounded tools — 6-turn cap

4
Guard side effects

Confirm + audit-log every refund or reset

The non-obvious choices

Case 04 · Document extraction Workflow · structured output + reflection

When the answer is a struct, not a paragraph.

Invoices, contracts, lab reports, KYC documents — any system whose job is "PDF in, typed Go struct out" is a workflow, not an agent. No loop. No tools. Two model calls (extract + validate-and-fix) plus a JSON Schema. The boring shape that ships.

extract/invoice.go
type Invoice struct {
    Number     string     `json:"number" jsonschema:"required"`
    IssuedAt   civil.Date `json:"issued_at" jsonschema:"required,format=date"`
    DueAt      civil.Date `json:"due_at,omitempty"`
    Currency   string     `json:"currency" jsonschema:"required,enum=USD|EUR|GBP|CAD"`
    SubTotal   decimal.Decimal `json:"sub_total"`
    Tax        decimal.Decimal `json:"tax"`
    Total      decimal.Decimal `json:"total"`
    LineItems  []LineItem     `json:"line_items"`
}

func ExtractInvoice(ctx context.Context, pdfBytes []byte) (Invoice, error) {
    var inv Invoice
    err := claude.StructuredFromPDF(ctx, pdfBytes,
        "Extract this invoice. Use null for missing fields. Don't guess.",
        &inv)
    if err != nil { return inv, err }

    // Cheap validation: arithmetic must hold.
    if !inv.SubTotal.Add(inv.Tax).Equal(inv.Total) {
        // Reflection: ask the model to re-read with the discrepancy highlighted.
        return refine(ctx, pdfBytes, inv,
            fmt.Sprintf("Total %s != SubTotal %s + Tax %s",
                inv.Total, inv.SubTotal, inv.Tax))
    }
    return inv, nil
}

The pattern: extract → validate with code → reflect only if validation fails. This keeps cost predictable in the happy path (one model call) while preserving accuracy on edge cases (model + critic + revise).

Case 05 · On-call SRE assistant Read-only ops tools · audit-logged · slack-bound

An agent that drafts the incident summary while you're still triaging.

A bot in your incident Slack channel listens for /triage and runs a read-only investigation — pulls recent deploys, recent alerts, recent log volume from impacted services, recent error rate. By the time the human responder is logged into the bastion, the agent has already posted a structured summary into the thread.

Read-only by construction

Every tool the agent has access to is read-only — grafana_query, recent_deploys, jaeger_search, log_query, git_log. The agent cannot roll back, kill pods, or page anyone. Side effects always go through a human.

Reflection on the summary

After the agent collects evidence, a critic pass asks: "Does the summary actually account for the symptom? Are there alternative hypotheses?" Catches the most embarrassing failure mode — confidently asserting a wrong root cause.

Bounded turn budget

8 turns max. If the agent hasn't formed a hypothesis in 8 turns, the report is "I couldn't isolate this" — that's a more useful output than the wrong answer in 30 turns.

Append-only audit log

Every tool call writes to an audit table with the on-call's user ID, the trigger message, the tool, the args, the result. Post-incident reviews replay the trace.

Case 06 · Internal data analyst SQL agent · safety rails · structured-output result

"Ask the warehouse" — without giving the model your warehouse keys.

A Slack-bound agent that turns natural-language questions ("how many users signed up last week from referral channel X") into safe, parameterized SQL against a read-replica. Three tools, four guardrails, and a result type that always includes the SQL it ran — so analysts can verify the answer.

tools/sql.go
type SQLArgs struct {
    Query string `json:"query" jsonschema:"description=A read-only SELECT query"`
}

type SQLResult struct {
    Query   string          `json:"query"`
    Rows    [][]any         `json:"rows"`
    Columns []string        `json:"columns"`
    RowsAffected int      `json:"rows_affected"`
}

func SQL(db *pgxpool.Pool) agent.Tool[SQLArgs] {
    return agent.Tool[SQLArgs]{
        Name:        "sql",
        Description: "Run a SELECT query. Read-only. Limited to 1000 rows. 5s timeout.",
        Run: func(ctx context.Context, a SQLArgs) (any, error) {
            if err := EnsureSelectOnly(a.Query); err != nil {
                return nil, err // rejected before it ever hits the DB
            }
            q := fmt.Sprintf("SELECT * FROM (%s) _sub LIMIT 1000", a.Query)
            tctx, cancel := context.WithTimeout(ctx, 5*time.Second)
            defer cancel()
            return runQuery(tctx, db, q)
        },
    }
}

The four guardrails

The output of every /ask in Slack always includes the SQL the agent ran. Analysts can read it, sanity-check it, and copy it into their own dashboards. That transparency is what makes this kind of agent acceptable internally — without it, no data team would let it near the warehouse.

Same patterns. Different surface.

Six cases — one coding agent, one research team, three workflow-shaped systems, one read-only agent. All share the same loop and protocol. Once you've built the first one, the rest are recombinations.

Browse examples → Re-read the foundations