Skip to content

writing

Blog

Notes from production — AI systems, performance, and the infrastructure underneath.

MCP is everywhere now — and so is its oldest constraint. How a transparent caching proxy gets any MCP server past the 25,000-token response limit.

  • #mcp
  • #ai-infrastructure
  • #caching
  • #open-source

How a 24/7 AI agent fleet stays affordable on one subscription: deterministic code handles every tick, and the model only runs on real signals.

  • #ai-agents
  • #automation
  • #llmops
  • #self-hosting

A FastAPI service on a fixed 1 vCPU went 1.68 to 69.6 RPS by adding async — before any hardware, workers, or DB tuning. A staged k6 study of throughput.

  • #python
  • #fastapi
  • #async
  • #performance
  • #benchmarking

Choosing an LLM by feel ships regressions you can't see. Picking models with an eval framework instead — latency, cost, accuracy, fit — from production.

  • #llm-evals
  • #rag
  • #llmops
  • #production-ai

A fleet pattern for 24/7 AI agents: one agent per machine as gatekeeper, a star topology, a chat room as the bus, and a subscription instead of metered keys.

  • #ai-agents
  • #claude-code
  • #fleet
  • #self-hosted
  • #architecture

A Parquet viewer worked in curl and on github.io, then served zero rows on the custom domain. The culprit: the CDN gzipped the file, breaking Range requests.

  • #parquet
  • #http
  • #github-pages
  • #debugging
  • #war-story