← Back to the blog

Building AI features in a Go backend

Adding AI to a backend isn't about a new framework. It's structured prompts, tool use, streaming responses, and failure modes you haven't seen.

Adding AI capabilities to a backend isn't about adopting a new framework or swapping your database. It's about integrating an external model API into your existing service layer — and learning that the failure modes are completely different from anything you've debugged before. Here's what the stack actually looks like in a production Go backend with real AI features.

Quick answerAn AI-powered feature in a Go backend is an HTTP call to a model API (Anthropic, OpenAI, etc.), structured prompts with typed outputs, streaming for responsiveness, and retry logic with exponential backoff. The hard parts are prompt reliability, cost control, and latency — not the code.

It's not a new framework

Most AI integration tutorials present LangChain or similar orchestration frameworks as the starting point. For a Go backend, that's usually the wrong starting point. You don't need an orchestration framework for most AI features — you need an HTTP client, a prompt template, and good error handling.

The framework abstractions are useful when you're building pipelines with many steps, retrieval systems, or autonomous agents. For a single AI-powered endpoint — summarise this document, extract these fields, classify this input — raw API calls are simpler, more debuggable, and cheaper.

The actual stack

  • Model API.Anthropic's Claude or OpenAI's GPT-4 class models for generation. Both have Go-compatible REST APIs; use the official SDKs or a simple HTTP client.
  • Prompt templates. Stored as typed Go structs and rendered with text/template. Version-controlled alongside the code they belong to.
  • Structured output.Use the model's JSON mode or tool use to get machine-readable responses. Never parse free-form text in production.
  • A budget and a cost tracker. AI API calls are billed per token. Instrument every call with input and output token counts. Set hard limits.

Structured prompts and tool use

The most important pattern for reliable AI features is structured output. Instead of asking the model to "respond with JSON" and hoping, use tool use (function calling) to define the output schema explicitly.

type InvoiceExtraction struct {
  VendorName  string  `json:"vendor_name"`
  Amount      float64 `json:"amount"`
  Currency    string  `json:"currency"`
  DueDate     string  `json:"due_date"`  // ISO 8601
  Confidence  float64 `json:"confidence"` // 0–1
}

// Pass the schema as a tool definition.
// The model populates it; you unmarshal it.
// No regex, no fragile string parsing.

Always include a confidence or uncertainfield. When the model isn't sure, it should tell you rather than fabricate a value. Add validation after unmarshaling — treat model output like user input.

Streaming responses

AI models generate tokens sequentially. For any user-facing feature, stream the response rather than waiting for completion. A 5-second wait with no feedback kills perceived performance. A stream that starts immediately feels fast even if the total latency is the same.

func (h *Handler) Generate(w http.ResponseWriter, r *http.Request) {
  w.Header().Set("Content-Type", "text/event-stream")
  w.Header().Set("Cache-Control", "no-cache")

  stream, err := h.ai.StreamCompletion(r.Context(), prompt)
  if err != nil { return }

  flusher := w.(http.Flusher)
  for chunk := range stream {
    fmt.Fprintf(w, "data: %s

", chunk.Text)
    flusher.Flush()
  }
}

Failure modes you haven't seen before

  • Rate limits. Model APIs have strict per-minute token limits. Build exponential backoff with jitter into every AI call. Unlike database errors, these are expected under load.
  • Hallucination in structured output. The model will sometimes return a plausible-looking but wrong value. Validate outputs against your domain rules, not just the schema.
  • Prompt injection. If user input flows into your prompt, a malicious user can override your instructions. Sanitise inputs, use system/user role separation, and validate that outputs match your expected structure.
  • Cost spikes. One misconfigured endpoint that calls the API in a loop can generate a large unexpected bill in minutes. Set per-request token limits and daily budget alerts from day one.

When it becomes an agent

An agent is a model that can take actions — call tools, read files, query databases — and loop until a task is done. In a Go backend, this means a loop that sends a prompt, checks if the model wants to call a tool, executes the tool, and feeds the result back.

The key design decision: always put a hard limit on the loop. A maximum of 10 iterations, a hard timeout, and a clear human-readable log of every action the agent took. Unbounded agents are expensive to debug and expensive to run.

If you're adding AI features to a Go backend and want a second opinion on the architecture, we review those. The decisions you make on prompt design and output validation early on are hard to change later.

★ ★ ★

End of article · Thanks for reading

Subscribe

More of this, once a month.