The Go ecosystem for AI agents is leaner than Python's by design. Most production stacks consist of a single provider SDK, the standard library, one vector-store client, and OpenTelemetry. Here's the short list, with opinions.
Every major provider ships a hand-maintained Go SDK now. Use the official one for the model you're targeting. Resist the temptation to lock in to a multi-provider abstraction on day one — premature portability is the most expensive abstraction in agent code.
The first-party Go SDK for Claude. First-class support for tool use, parallel tool calls, prompt caching, extended thinking, the Files API, and SSE streaming. The most ergonomic SDK in this list for long-running agentic loops.
OpenAI's official Go client — function calling, structured outputs via response_format, embeddings, the Responses API, batch jobs, file uploads. Vendored by Anthropic (as part of their multi-provider eval harness) and OpenAI alike, so it sees real production load.
The unified Gemini client — replaces the older generative-ai-go. Supports function calling, Gemini 2.5 multimodal inputs, long-context (1M+ tokens), grounding with Google Search, and the Live API for bidirectional streaming.
Cohere's official Go SDK — useful for two specific things: Rerank-3 (the best-in-class reranker for RAG pipelines) and embed-v3 (multilingual embeddings). Most agents reach for it as a complement, not a primary model.
If your stack lives in AWS and you need IAM-shaped access controls (rather than provider API keys), the Bedrock Runtime client is the path. Multiplexes Anthropic, Meta, Mistral, Cohere, and Amazon Titan behind a single signed API.
The official Go client for Ollama — talk to llama3, qwen2.5-coder, deepseek-r1, etc., running locally via Ollama's HTTP server. The right pick for air-gapped agents, integration tests, and "doesn't ship the API key" demos.
Go has fewer agent frameworks than Python because Go developers are skeptical of frameworks. The three below are the ones with non-trivial production usage. Read their source — they're all small enough to skim in an afternoon.
The community Go port of LangChain. Provider abstractions, basic chains, tool wrappers, vector store interfaces. Good for prototyping; most teams outgrow it once they need fine control over message shape or streaming.
ByteDance's open-source agent framework. Stronger primitives than langchaingo for agent graphs (DAG composition, parallel branches, structured streaming). Heavier, but more honest about modeling the agent runtime.
Google's Genkit framework, with a Go runtime. Good developer-tooling story (local UI, eval harness, deploy to Cloud Run). Couples tightly to Google's ecosystem; useful if you're already there.
A framework that hides the message protocol from you isn't saving you complexity — it's deferring the moment you have to learn it. Pay the tuition early; your agents will be better.
Don't add a new database for embeddings until you have a benchmarked reason. pgvector is the right default for almost every Go shop. Specialized stores earn their keep at scales most agents will never reach.
| Library | Backing store | When to choose it |
|---|---|---|
| github.com/pgvector/pgvector-go | Postgres + pgvector | Default. You already run Postgres. HNSW indexes, JOIN to your relational data, transactional writes. |
| github.com/qdrant/go-client | Qdrant | Hundreds of millions of vectors, hybrid sparse+dense search, payload filtering at scale. |
| github.com/weaviate/weaviate-go-client/v4 | Weaviate | Schema-rich knowledge bases, GraphQL queries, native multimodal vectors. |
| github.com/milvus-io/milvus-sdk-go/v2 | Milvus | Billion-scale, multi-tenant, on-prem GPU ANN. Real big-data territory. |
| github.com/pinecone-io/go-pinecone | Pinecone (managed) | You don't want to operate a database; you have Pinecone budget; you need their hybrid retrieval. |
| redis/go-redis + redisearch | Redis Stack | You already have Redis at the cache layer and want to colocate hot vectors there. |
MCP is the open protocol Anthropic introduced for giving language models structured tool access. Rather than every team writing their own filesystem / GitHub / Postgres wrappers, MCP defines a JSON-RPC contract over stdio or SSE, and a growing list of servers expose tools to any compatible client.
// A 30-line MCP server exposing one tool.
import "github.com/modelcontextprotocol/go-sdk/mcp"
type EchoArgs struct {
Text string `json:"text"`
}
func main() {
srv := mcp.NewServer(&mcp.Implementation{
Name: "echo-server",
Version: "0.1.0",
}, nil)
mcp.AddTool(srv, &mcp.Tool{
Name: "echo",
Description: "Echoes the input back.",
}, func(ctx context.Context, _ *mcp.CallToolRequest,
a EchoArgs) (*mcp.CallToolResult, any, error) {
return &mcp.CallToolResult{
Content: []mcp.Content{&mcp.TextContent{Text: a.Text}},
}, nil, nil
})
if err := srv.Run(context.Background(),
&mcp.StdioTransport{}); err != nil {
log.Fatal(err)
}
}
Agents are stochastic. The thing you'll want most when one misbehaves at 3am is a complete record of every turn, every tool call, every input and output. Build that on day one.
The structured-logging package in stdlib since Go 1.21. Always use it. Always include a stable run_id attribute on every log line.
OTel SDK + the OTLP gRPC exporter. One span per agent run, child spans per turn, child-of-child spans per tool. The trace becomes your primary debugger.
Counters and histograms for: tokens by model, tool calls by name, errors by class, $ spent. Three dashboards and you'll never wonder why a bill jumped.
// The three lines that turn a demo into something you can run on call.
ctx, span := tracer.Start(ctx, "agent.turn", trace.WithAttributes(
attribute.Int("turn", n),
attribute.String("run_id", runID),
))
defer span.End()
slog.InfoContext(ctx, "turn", "n", n, "msgs", len(history))
metrics.AgentTurns.WithLabelValues(modelName).Inc()
Reflection-based JSON Schema generation. Turn a Go struct into a tool schema in one call. Indispensable for typed tool registries.
Local token counting for OpenAI/Anthropic-compatible BPE tokenizers. Essential for cost meters and pre-flight context-window checks.
The right way to fan out parallel tool calls or worker agents. Cancellable, error-propagating, with concurrency caps.
The de-facto Go circuit breaker. 200 lines of code that prevent thundering-herd retries when a provider goes south.
Exponential backoff with jitter. Pair with a circuit breaker; the two together are the standard retry policy for provider HTTP calls.
UUIDs for run IDs, turn IDs, tool-call IDs. Don't reach for time-based IDs; you'll regret it the first time two runs collide in your trace store.
Fast JSON path queries when the model emits JSON you need to introspect cheaply. Often replaces a full json.Unmarshal for routing decisions.
Headless Chrome controller. Pair with an agent that needs real browser tools (computer-use style) — far better than scraping with raw HTTP.
The Docker Engine API client — the right way to give an agent a sandboxed shell. Spawn a container per run, mount only the project, kill on exit.
None of these stacks are required — they're starting points. The lesson is the same in each: keep the stack short, prefer official SDKs, and add a layer only when you can articulate why.
A deterministic chain — extract → classify → format. Latency-sensitive. No agent loop.
A ReAct agent in real production. Typed tools, OTel, retries, cost cap.
Orchestrator + specialist workers, each in their own context window.
Read the cases chapter to see how these libraries compose into real, shipping systems.
Read the cases →