Every agent in production today is a recombination of seven small patterns. Naming them precisely makes design decisions concrete and turns "we'll need an agent for this" into a hour-long conversation, not a quarter-long project.
Step → step → step. The simplest, dumbest, most useful pattern.
WorkflowA classifier picks one of N specialized handlers.
WorkflowFan out to N independent calls; aggregate results.
AgentReason → act → observe → repeat. The canonical loop.
AgentGenerate → critique → revise. Cheap quality lift.
AgentA separate evaluator scores; the generator iterates.
Multi-agentSupervisor delegates to specialist sub-agents in parallel.
The most common production "AI feature" is a fixed sequence of two or three model calls — extract → classify → summarize, or research → outline → write. No loop, no agent. Just a function that calls the model the right number of times.
// summarize → translate → format. Plain Go, no framework.
func Translate(ctx context.Context, doc string) (string, error) {
summary, err := llm.Once(ctx,
"Summarize the document below in 80 words.", doc)
if err != nil { return "", err }
german, err := llm.Once(ctx,
"Translate to German, preserving tone.", summary)
if err != nil { return "", err }
final, err := llm.Once(ctx,
"Format as Markdown with headings.", german)
return final, err
}
type Intent string
const (
IntentBilling Intent = "billing"
IntentTechnical Intent = "technical"
IntentSales Intent = "sales"
)
func Route(ctx context.Context, q string) (string, error) {
var label struct{ Intent Intent }
err := llm.Structured(ctx,
"Classify the message into billing|technical|sales.",
q, &label)
if err != nil { return "", err }
switch label.Intent {
case IntentBilling: return billingAgent(ctx, q)
case IntentTechnical: return techAgent(ctx, q)
case IntentSales: return salesChain(ctx, q)
}
return fallback(ctx, q)
}
A small, fast model (Haiku, GPT-4o-mini, Gemini Flash) classifies the request, then a Go switch dispatches to a handler optimized for that path. Each handler can be a workflow, a single-agent, or another router — composition all the way down.
When the work is independent, fan out. Sectioning splits a long input into chunks and processes each in parallel. Voting runs the same prompt N times and combines outputs (majority, ranked, ensembled). Go's errgroup makes both trivial.
import "golang.org/x/sync/errgroup"
func SummarizeShards(ctx context.Context, shards []string) ([]string, error) {
out := make([]string, len(shards))
g, gctx := errgroup.WithContext(ctx)
g.SetLimit(8) // concurrency cap; respect provider rate limits.
for i, s := range shards {
i, s := i, s
g.Go(func() error {
summary, err := llm.Once(gctx,
"Summarize this in 60 words.", s)
out[i] = summary
return err
})
}
return out, g.Wait()
}
// The canonical agent loop. The model decides; the dispatcher acts;
// the loop repeats until the model says "done".
func ReAct(ctx context.Context, a *Agent, goal string) (string, error) {
msgs := []Message{
{Role: "system", Content: a.SystemPrompt},
{Role: "user", Content: goal},
}
for turn := 0; turn < a.MaxTurns; turn++ {
resp, err := a.Model.Complete(ctx, msgs, a.Tools)
if err != nil { return "", err }
if len(resp.ToolCalls) == 0 {
return resp.Text, nil
}
msgs = append(msgs, resp.AsAssistant())
for _, c := range resp.ToolCalls {
res, terr := a.Tools.Dispatch(ctx, c)
msgs = append(msgs, Message{
Role: "tool", ToolCallID: c.ID,
Content: render(res, terr),
})
}
}
return "", errors.New("max turns")
}
ReAct — reason & act — is what you get when you give a model tools and a loop. Modern frontier models do the "reasoning" part implicitly; you don't need to prompt-engineer "Thought:" prefixes anymore. What matters is the loop, the tool registry, and the stop condition.
Generate a draft, run the same model with a critic system prompt, hand the critique back, generate a revision. One extra round-trip — and the output quality jumps in a way that's frequently visible to humans in blind comparison.
func Reflect(ctx context.Context, brief string) (string, error) {
draft, err := llm.System(ctx,
"You are a senior writer. Produce a thorough first draft.",
brief)
if err != nil { return "", err }
critique, err := llm.System(ctx,
"You are a ruthless editor. List concrete problems with the draft.",
"BRIEF:\n"+brief+"\n\nDRAFT:\n"+draft)
if err != nil { return "", err }
return llm.System(ctx,
"You are the writer. Revise to address every critique below.",
fmt.Sprintf("DRAFT:\n%s\n\nCRITIQUE:\n%s", draft, critique))
}
type Score struct {
Pass bool `json:"pass"`
Feedback string `json:"feedback"`
}
func EvalLoop(ctx context.Context, brief string) (string, error) {
var (draft string; err error; prev Score)
for attempt := 0; attempt < 3; attempt++ {
draft, err = generator(ctx, brief, prev.Feedback)
if err != nil { return "", err }
prev, err = evaluator(ctx, brief, draft)
if err != nil { return "", err }
if prev.Pass { return draft, nil }
}
return draft, errors.New("max attempts; last feedback: "+prev.Feedback)
}
Reflection asks one model to critique itself. Evaluator-optimizer uses two distinct components — usually a stronger generator and a faster evaluator — and the evaluator returns a structured score. The loop runs until the score passes or the budget is exhausted.
The win versus naive reflection: the evaluator can be a different model (cheaper, fine-tuned), can run pass/fail predicates as code (not just LLM judgment), and produces structured signals you can log and grade evals against.
An orchestrator decomposes a goal into sub-tasks at runtime — not at design time, that's parallelization. Each sub-task gets its own worker agent with its own context window, its own tools, its own budget. The orchestrator then synthesizes worker outputs into a final answer.
Plans · delegates · synthesizes
One synthesized answer
// The orchestrator's only "tool" is spawning a worker.
type DispatchArgs struct {
Worker string `json:"worker"` // "researcher" | "code_reader" | "summarizer"
Task string `json:"task"` // natural-language sub-goal
}
var DispatchTool = agent.Tool[DispatchArgs]{
Name: "dispatch",
Description: "Spawn a specialist worker agent. Workers run in parallel.",
Run: func(ctx context.Context, a DispatchArgs) (any, error) {
w, ok := workers[a.Worker]
if !ok { return nil, fmt.Errorf("unknown worker %q", a.Worker) }
// Each worker runs ReAct in its own goroutine, with a fresh context window.
return w.Run(ctx, a.Task)
},
}
Modern providers (Anthropic, OpenAI, Gemini) all support parallel tool calls — a single assistant turn can request dispatch(researcher), dispatch(code_reader), and dispatch(summarizer) simultaneously. The Go side fans out to three goroutines, gathers results into the next message turn, and the orchestrator can plan the next round.
An agent that rewrites its own system prompt mid-run. Sounds clever; in practice produces drift, prompt explosion, and impossible-to-debug behavior. If the prompt isn't right, fix it offline — don't let the agent fix it at runtime.
Workers that can spawn workers that can spawn workers. Cap depth at 1 — orchestrator + workers, period. Deeper trees blow up token budgets without measurable quality gain in any published evaluation.
A vector store the agent freely writes to and reads from. After a week the relevant items are needles in a haystack of stale context. Always pair writes with TTL or a reranking step that looks at recency.
A database tool whose first argument is "action": "select"|"insert"|"delete". Models pick the wrong verb regularly. Three small tools with three small schemas outperform one big tool every time.
Each of these patterns has a runnable Go example in the next chapter — clone, run, read, modify.
Browse the example library →