writing

Blog

Notes from production — AI systems, performance, and the infrastructure underneath.

Breaking the 25,000-token wall

Jun 10, 2026

MCP is everywhere now — and so is its oldest constraint. How a transparent caching proxy gets any MCP server past the 25,000-token response limit.

#mcp
#ai-infrastructure
#caching
#open-source

Make the model the exception, not the loop

Jun 10, 2026

How a 24/7 AI agent fleet stays affordable on one subscription: deterministic code handles every tick, and the model only runs on real signals.

#ai-agents
#automation
#llmops
#self-hosting

The 160× index: a 4.18-second dashboard and the COUNT(*) that ate it

Jun 10, 2026

My fleet dashboard quietly degraded to 4.18s. The cause: one COUNT(*) full-scanning 258k rows on every load. One index later: ~18ms, flat forever.

#sqlite
#performance
#go
#war-story

41× from one keyword

Jun 10, 2026

A FastAPI service on a fixed 1 vCPU went 1.68 to 69.6 RPS by adding async — before any hardware, workers, or DB tuning. A staged k6 study of throughput.

#python
#fastapi
#async
#performance
#benchmarking

Evals before vibes

Jun 10, 2026

Choosing an LLM by feel ships regressions you can't see. Picking models with an eval framework instead — latency, cost, accuracy, fit — from production.

#llm-evals
#rag
#llmops
#production-ai

Three laptops, one subscription

Jun 10, 2026

A fleet pattern for 24/7 AI agents: one agent per machine as gatekeeper, a star topology, a chat room as the bus, and a subscription instead of metered keys.

#ai-agents
#claude-code
#fleet
#self-hosted
#architecture

gzip ate my byte ranges

Jun 10, 2026

A Parquet viewer worked in curl and on github.io, then served zero rows on the custom domain. The culprit: the CDN gzipped the file, breaking Range requests.

#parquet
#http
#github-pages
#debugging
#war-story