4 Months Down an AI Engineering Hole

I went dark on Jan 13th. Went all in on Openclaw. Got sucked down the AI engineering hole.

Four months in, I'd roughly map it as three phases. First two were a blast. The one I'm in now isn't, and it's the only one that's mattered.

Phase 1: Building like a child

Terminal open, half understanding the commands I was typing, watching things work and getting an absolutely unreasonable hit off it.

I wasn't building from a plan. I didn't have a plan. I didn't know what a plan in this context even looked like. I was designing on the fly, gluing libraries together, copy-pasting from Stack Overflow and ChatGPT and Claude and praying. Building tools that did nonsensical things because the act of building was the point. Stacking circular logic on top of circular logic until I'd accidentally edit a config file I shouldn't have touched and the whole setup went offline.

Six hours fixing it and I felt like a god when I did.

My version of this was a Sunday night in late January. I had Edwardo, the first agent, running on a Mac Mini in the office over Tailscale, with Telegram as the control interface, and a launch agent I'd been editing live for two weeks. Made one change too many. The daemon stopped firing. Edwardo went silent for about four hours. When the first new message finally came through I punched the air at my desk like I'd just won something. I had no real idea why it had broken and no real idea why my fix had worked. Both still felt like a win.

It's a kind of addiction. I wasn't learning frameworks. I was learning that I could actually do this. The gap between "this is for other people" and "I just shipped a thing" rewires how you think about what you can build on your own. It doesn't unwire.

It also lies to you. Hard. Almost nothing I built in this phase was real. It demoed. It wouldn't have survived a second user, a stale token, or a Tuesday.

Phase 2: Becoming insufferable

By now I'd built something. Maybe four somethings. I had opinions. I'd read a paper. I thought I'd seen the matrix.

So I told everyone. Cornered people at dinner. Explained agentic loops to friends who came round for a drink. Started sentences with "the thing about RAG is" like anyone had asked. Had a TED talk in my head and was delivering it whether the audience consented or not. This phase is genuinely irritating to be around and I'm sorry to everyone who lived through it with me.

It ended the moment I stopped talking and started listening.

I watched videos on harness engineering. Read Latent Space posts. Read Anthropic's piece on building effective agents. Realised the people who actually ship production AI are talking about evals, traces, structured outputs, orchestration patterns, and I didn't really understand any of it at depth. I'd thought I was 70% of the way there. I was about 4%.

The quietest version of it was going back to something I'd built three weeks earlier, reading the code, and not being able to defend a single architectural decision. The whole thing was held together with hope.

The version that actually broke me was the Telegram bot conflict. I had two daemons on the Mac Mini both running getUpdates against the same bot token. I'd installed the second one three weeks earlier, forgotten about it, and they'd been quietly fighting each other for incoming messages ever since. Some commands landed, some didn't, in no pattern I could see. Spent the better part of an afternoon chasing it through logs before I found the orphan launch agent. The fix in the end was to revoke the token and start over. The lesson sat with me longer than the fix did. I'd been running infrastructure I couldn't have drawn from memory and that's not infrastructure, that's luck.

The talk that snapped me out of Phase 2 is below. Karpathy lays out why this paradigm is actually different and what it means to build on top of it. If you only watch one thing while you're stuck in this phase, watch this one. It reframed almost everything I thought I was doing.

Andrej Karpathy, Software Is Changing (Again), YC AI Startup School, June 2025. Watch on YouTube.

Phase 3: Where I am now

Building doesn't give me the rush anymore.

I'm in spec docs. I'm arguing with Claude Code about whether my regex actually catches the edge case I think it catches. I'm standing up local instances because I've finally clocked that uploading every internal email, every client P&L, every personal note to a third party cloud is a decision, not a default. Thinking about data sovereignty and what a Gatekeeper layer should actually do before any payload leaves the machine. Running local Gemma on a Mac Mini as the redaction layer not because it's clever but because it's the only honest way to keep raw client data out of someone else's training set.

I read the news differently now. The "60% of white collar work evaporates in the next 12 to 24 months" line stopped sounding like clickbait and started sounding like a forecast I'm personally exposed to. I've stopped arguing with whether it's true and started asking whether we'd still be employable on the other side of it.

So we're doing the only sane thing. Going back through the entire business. Every workflow. Every approval. Every human-in-the-loop decision. Asking which of these actually needs a human and which exists because that's how we've always done it. Sitting with the answers, which are uncomfortable.

We're rebuilding the company while running the company. Trying not to lose our hair while operating the day to day. I'm having to explain to the team that this isn't about replacing people, it's about not getting replaced as a company. Some of them get it. Some don't. That gap matters.

This is the boring phase. Probably the only one that counts.

I've stopped wanting to build in public and started wanting to build properly. Stopped reaching for new tools and started staying with the boring ones long enough to actually understand them. Stopped thinking about the demo and started thinking about the failure mode when the demo runs in production at 3am on a Tuesday with a real client on the other end.

If we don't move fast and move well we're dead in the water.

The clearest example of this for me was the CrewAI pivot. Three weeks before writing this I was deep in CrewAI. Six named agents, a multi-agent orchestration layer, a whole choreography I'd specced out over a weekend and was genuinely excited about. Then I sat with the spec and asked, for each agent, what it was actually doing that one well-prompted Edwardo couldn't. Honest answer for most of them was nothing. So I tore the framework out. Went back to a single agent on a multi-provider cloud router with local Gemma sitting in front as the redaction layer, so the raw stuff never leaves the machine. Build got smaller. Team got more confused. System got more honest.

Closing

That's where I am four months in.

I'll keep documenting it as it happens. Where I'm landing on data sovereignty. The local agent stack. What's actually working in production versus what just looked good in a demo. What it does to a business when you take it apart and put it back together while it's still trading.

That's it for now.