agent.go
Pattern 01 · Prompt chaining

When the steps are known up front, just chain them.

The most common production "AI feature" is a fixed sequence of two or three model calls — extract → classify → summarize, or research → outline → write. No loop, no agent. Just a function that calls the model the right number of times.

Use it when

  • You can describe the algorithm in three bullet points before you've seen any input.
  • Each step has a measurable, gated checkpoint (validate the JSON, check the classification confidence).
  • Latency budget is tight — chains are predictable; agents are not.
workflows/chain.go
// summarize → translate → format. Plain Go, no framework.
func Translate(ctx context.Context, doc string) (string, error) {
    summary, err := llm.Once(ctx,
        "Summarize the document below in 80 words.", doc)
    if err != nil { return "", err }

    german, err := llm.Once(ctx,
        "Translate to German, preserving tone.", summary)
    if err != nil { return "", err }

    final, err := llm.Once(ctx,
        "Format as Markdown with headings.", german)
    return final, err
}
workflows/route.go
type Intent string

const (
    IntentBilling  Intent = "billing"
    IntentTechnical Intent = "technical"
    IntentSales    Intent = "sales"
)

func Route(ctx context.Context, q string) (string, error) {
    var label struct{ Intent Intent }
    err := llm.Structured(ctx,
        "Classify the message into billing|technical|sales.",
        q, &label)
    if err != nil { return "", err }

    switch label.Intent {
    case IntentBilling:   return billingAgent(ctx, q)
    case IntentTechnical: return techAgent(ctx, q)
    case IntentSales:     return salesChain(ctx, q)
    }
    return fallback(ctx, q)
}
Pattern 02 · Routing

A classifier turns one ambiguous request into one of N specialist handlers.

A small, fast model (Haiku, GPT-4o-mini, Gemini Flash) classifies the request, then a Go switch dispatches to a handler optimized for that path. Each handler can be a workflow, a single-agent, or another router — composition all the way down.

Why routing pays for itself

  • Smaller / cheaper models on the classifier; reserve the expensive model for the actual work.
  • Each handler can have its own tools, system prompt, and budget — separation of concerns the agent itself doesn't have.
  • Observability becomes drillable: P95 latency by intent, error rate by intent, cost by intent.
Pattern 03 · Parallelization

Two flavors: sectioning (split work) and voting (consensus).

When the work is independent, fan out. Sectioning splits a long input into chunks and processes each in parallel. Voting runs the same prompt N times and combines outputs (majority, ranked, ensembled). Go's errgroup makes both trivial.

When to vote

  • Safety-critical classification (is this content allowed) — N=3 voters cuts false negatives.
  • Code review with diverse personas — same code, three different reviewer prompts, merge findings.
  • Eval consensus — when you don't have a ground-truth answer, multiple model judges approximate one.
workflows/parallel.go
import "golang.org/x/sync/errgroup"

func SummarizeShards(ctx context.Context, shards []string) ([]string, error) {
    out     := make([]string, len(shards))
    g, gctx := errgroup.WithContext(ctx)
    g.SetLimit(8) // concurrency cap; respect provider rate limits.

    for i, s := range shards {
        i, s := i, s
        g.Go(func() error {
            summary, err := llm.Once(gctx,
                "Summarize this in 60 words.", s)
            out[i] = summary
            return err
        })
    }
    return out, g.Wait()
}
agents/react.go
// The canonical agent loop. The model decides; the dispatcher acts;
// the loop repeats until the model says "done".
func ReAct(ctx context.Context, a *Agent, goal string) (string, error) {
    msgs := []Message{
        {Role: "system", Content: a.SystemPrompt},
        {Role: "user",   Content: goal},
    }
    for turn := 0; turn < a.MaxTurns; turn++ {
        resp, err := a.Model.Complete(ctx, msgs, a.Tools)
        if err != nil { return "", err }
        if len(resp.ToolCalls) == 0 {
            return resp.Text, nil
        }
        msgs = append(msgs, resp.AsAssistant())
        for _, c := range resp.ToolCalls {
            res, terr := a.Tools.Dispatch(ctx, c)
            msgs = append(msgs, Message{
                Role: "tool", ToolCallID: c.ID,
                Content: render(res, terr),
            })
        }
    }
    return "", errors.New("max turns")
}
Pattern 04 · ReAct

The pattern every agent collapses into.

ReAct — reason & act — is what you get when you give a model tools and a loop. Modern frontier models do the "reasoning" part implicitly; you don't need to prompt-engineer "Thought:" prefixes anymore. What matters is the loop, the tool registry, and the stop condition.

Make ReAct boring

  • Cap turns at 12, not 50.
  • Cap parallel tool calls per turn at 5.
  • Log every turn with stable IDs — debugability is non-negotiable.
  • Persist messages to disk if the agent runs >30s; you'll want to replay.
Pattern 05 · Reflection

A second pass with a critic prompt is the cheapest quality lift in the catalog.

Generate a draft, run the same model with a critic system prompt, hand the critique back, generate a revision. One extra round-trip — and the output quality jumps in a way that's frequently visible to humans in blind comparison.

When to use it

  • Long-form generation (reports, articles, code reviews) where one-shot output reads first-draft.
  • Code generation — a critic pass catches obvious bugs the generator missed.
  • Anything where "did I really cover the brief" is a useful question to ask.

When to skip it

  • Latency-sensitive paths. Reflection roughly doubles wall time.
  • Trivially-checkable outputs (classification, extraction). Validate with code, not a critic.
agents/reflect.go
func Reflect(ctx context.Context, brief string) (string, error) {
    draft, err := llm.System(ctx,
        "You are a senior writer. Produce a thorough first draft.",
        brief)
    if err != nil { return "", err }

    critique, err := llm.System(ctx,
        "You are a ruthless editor. List concrete problems with the draft.",
        "BRIEF:\n"+brief+"\n\nDRAFT:\n"+draft)
    if err != nil { return "", err }

    return llm.System(ctx,
        "You are the writer. Revise to address every critique below.",
        fmt.Sprintf("DRAFT:\n%s\n\nCRITIQUE:\n%s", draft, critique))
}
agents/evaluator.go
type Score struct {
    Pass     bool   `json:"pass"`
    Feedback string `json:"feedback"`
}

func EvalLoop(ctx context.Context, brief string) (string, error) {
    var (draft string; err error; prev Score)
    for attempt := 0; attempt < 3; attempt++ {
        draft, err = generator(ctx, brief, prev.Feedback)
        if err != nil { return "", err }

        prev, err = evaluator(ctx, brief, draft)
        if err != nil { return "", err }
        if prev.Pass { return draft, nil }
    }
    return draft, errors.New("max attempts; last feedback: "+prev.Feedback)
}
Pattern 06 · Evaluator-optimizer

Reflection's bigger sibling — separate evaluator, hard pass/fail gate.

Reflection asks one model to critique itself. Evaluator-optimizer uses two distinct components — usually a stronger generator and a faster evaluator — and the evaluator returns a structured score. The loop runs until the score passes or the budget is exhausted.

The win versus naive reflection: the evaluator can be a different model (cheaper, fine-tuned), can run pass/fail predicates as code (not just LLM judgment), and produces structured signals you can log and grade evals against.

Pattern 07 · Orchestrator + workers

When you need parallel open-ended work, fan out into worker agents.

An orchestrator decomposes a goal into sub-tasks at runtime — not at design time, that's parallelization. Each sub-task gets its own worker agent with its own context window, its own tools, its own budget. The orchestrator then synthesizes worker outputs into a final answer.

Orchestrator

Plans · delegates · synthesizes

Worker · researcher tools: web_search, fetch_url
Worker · code_reader tools: list_files, read_file, grep
Worker · summarizer tools: (none — pure LLM)
Result

One synthesized answer

agents/orchestrator.go
// The orchestrator's only "tool" is spawning a worker.
type DispatchArgs struct {
    Worker string   `json:"worker"`   // "researcher" | "code_reader" | "summarizer"
    Task   string   `json:"task"`     // natural-language sub-goal
}

var DispatchTool = agent.Tool[DispatchArgs]{
    Name:        "dispatch",
    Description: "Spawn a specialist worker agent. Workers run in parallel.",
    Run: func(ctx context.Context, a DispatchArgs) (any, error) {
        w, ok := workers[a.Worker]
        if !ok { return nil, fmt.Errorf("unknown worker %q", a.Worker) }
        // Each worker runs ReAct in its own goroutine, with a fresh context window.
        return w.Run(ctx, a.Task)
    },
}

Modern providers (Anthropic, OpenAI, Gemini) all support parallel tool calls — a single assistant turn can request dispatch(researcher), dispatch(code_reader), and dispatch(summarizer) simultaneously. The Go side fans out to three goroutines, gathers results into the next message turn, and the orchestrator can plan the next round.

Anti-patterns

Patterns that look smart on a whiteboard and lose money in production.

"Self-improving" prompts

An agent that rewrites its own system prompt mid-run. Sounds clever; in practice produces drift, prompt explosion, and impossible-to-debug behavior. If the prompt isn't right, fix it offline — don't let the agent fix it at runtime.

Unbounded recursion

Workers that can spawn workers that can spawn workers. Cap depth at 1 — orchestrator + workers, period. Deeper trees blow up token budgets without measurable quality gain in any published evaluation.

"Memory" without an eviction policy

A vector store the agent freely writes to and reads from. After a week the relevant items are needles in a haystack of stale context. Always pair writes with TTL or a reranking step that looks at recency.

One mega-tool with a verb arg

A database tool whose first argument is "action": "select"|"insert"|"delete". Models pick the wrong verb regularly. Three small tools with three small schemas outperform one big tool every time.

Now make it concrete.

Each of these patterns has a runnable Go example in the next chapter — clone, run, read, modify.

Browse the example library →