Please note: even this article is generated with AI

I built a production-grade SRE platform almost alone. Here is the story of AI agentic development.
A few months ago I started building SRE Visualizer: a full-stack analytics platform for Site Reliability Engineering teams. It connects to Confluence and Jira, parses on-call rota schedules, fetches and categorises support tickets with AI, generates operations reviews, renders interactive dashboards, exports PDF reports, and tracks team performance over time.
It ships with a 475-test suite, a Docker image, multi-process deployment modes, a proper cache layer, a split-mode HTTP proxy, an admin panel, a PDF exporter, and full documentation.
Sounds like a six-month project for a small team, right?
It was built solo, iteratively, in weeks but using just “free time” (the AI works while I am doing other stuff, like when I am in a meeting, or having lunch) — with AI agents as my development crew. The agent crew: who does what I did not just use an AI chat to write some code. I designed and operated a multi-agent
system
with specialised roles, each with its own context, rules, and responsibilities:

  • planner — receives a feature request and produces a full action plan before any code is written. It identifies files to change, risks, and steps.
  • code-executer — the workhorse. Implements code changes, runs the test suite, fixes errors, and iterates until every test is green.
  • documenter — after every change, updates architecture docs, changelogs, API docs, and the contributors file atomically with the code.
  • product-owner — challenges scope and business value. Asks the hard question: does this feature actually need to be built right now?
  • architectural-analysis — reviews every change for structural impact. Flags coupling, scalability risks, and design debt before they land.
  • code-analysis — deep-dives into existing code to understand impact before a change is made. Answers: what will break, and what depends on this?
  • testing-analysis — analyses test coverage gaps, identifies missing edge cases, and ensures new features are fully testable before coding starts.

Each agent operates with strict rules: no test can ever call a real external service, every change must be traceable, documentation must be updated atomically with the code. The system enforces quality — it does not just generate text.
The result: I ship a feature, the tests run, the docs update, and the changelog is written — all in one coordinated flow. AI inside the product itself.
The AI agents built the platform, but AI is also deeply integrated into what the platform does. Here are the three most interesting uses:

  • Ticket categorisation: every Jira ticket is sent to a Langchain-powered GPT model that classifies it by type, severity, and whether it required a manual intervention. A heuristic fallback kicks in if the AI is unavailable, so the pipeline never stalls.
  • Rota parsing from Confluence: the on-call schedule lives as a human-readable HTML table in Confluence. The app sends that HTML to the AI, which extracts structured rota data (teams, dates, people, roles) that no regex could reliably handle. A deterministic BeautifulSoup pass runs first; AI is the fallback for ambiguous layouts.
  • AI operations review: at the end of each rota period, the system generates a full written review — ticket trends, workload analysis, recommendations — cached per team and period, and surfaced directly in the Summary page.


All AI calls go through a single service layer (app/services/copilot_ai.py) with model fallback (gpt-4.1-mini -> gpt-4.1 -> gpt-4o), retry logic, and in-memory caching keyed by team, period, and content hash.

The numbers

  • v1 to v4.2.1 shipped across dozens of iterations
  • ~10,400 lines of Python application code across backend, services, models, and API controllers
  • ~7,700 lines of test code — 475 automated tests, all mocked, all running in under 10 seconds
  • ~5,200 lines of frontend (vanilla JS, HTML, CSS) — no framework, no build step, full SPA
  • Zero regressions introduced by AI-generated code — because the guardrails were designed by a human who knew what to guard against

Features that would typically require a backend developer, a frontend developer, a QA engineer, and a tech writer — all owned by one person with an agent system.

The technology under the hood

Every technology choice was deliberate. Here is the short version:

  • FastAPI — async Python REST API framework. Handles all backend routes, dependency injection, and multi-mode startup (full / cache-mode / controller-mode).
  • Langchain (ChatOpenAI) — unified interface for all AI calls. Provides prompt chaining, model fallback, and async invocation via ainvoke.
  • BeautifulSoup — deterministic HTML parsing for Confluence rota pages before AI is involved, keeping costs low and latency fast.
  • ReportLab / fpdf2 — PDF generation for on-call reports. Exports the full dashboard — stats, charts, AI review — into a single file.
  • In-memory CacheProvider — singleton cache with TTL, targeted invalidation, metrics, and a clear-all parameter. Zero external dependencies for caching.
  • httpx — async HTTP client used for the controller-to-cache HTTP proxy layer introduced in v4.2.1, with sync fallback to in-process cache.
  • Vanilla JS SPA — no React, no Vue, no build toolchain. Seventeen focused JS modules, each owning one page. Fast, debuggable, zero dependency drift.
  • Docker + run.sh — containerised deployment with an interactive shell menu to select app mode, with file-watching and auto-restart built in.

What most people miss about AI-assisted development

The hard part is not prompting. It is knowing:

  • What architecture decisions to make before the agents touch the code
  • How to design agent boundaries so they do not create conflicting changes
  • When to override AI output and why — and how to encode that judgment into rules
  • How to build a safety net so fast iteration does not mean fragile code (three-layer network blocking in tests, enforced by the agent runtime itself)
  • How to incrementally evolve a system across four-plus major versions while keeping everything coherent

This is senior engineering expertise applied to AI-native development. Not AI replacing engineers. AI multiplying them.
What this means for your organisation? The competitive advantage is no longer having more developers. It is having engineers who know how to build and operate AI agent systems that multiply output without multiplying headcount.
AI agentic development is not a future trend. The tools exist. The patterns work. The quality is real. But it requires deep expertise to do it right. Someone needs to design the system, set the constraints, own the architecture, and know when the AI is wrong. That person is not a prompt engineer. That person is a senior software engineer who has also learned to think in agents.
If your organisation is not building this capability yet, you are already behind.

Share