You still need to know what you're doing

March 1, 2025 · 5 min read

Over the last few months, I've had the opportunity to revisit some AI coding tools and see how they have evolved. My first impression was that they are now much more capable than they were even just a few months ago.

My experience is rather bimodal: it's magic when it works, and outright disillusionment when it doesn't. The good news: if you know what you're doing, you don't need it to work 100% of the time.

The patterns that I've seen emerge have been:

Tiny targetted changes, especially tab-complete, really speed up typing. Sometimes it gets in the way and you just want to shoo the suggestion off the screen, but for the most part it's really good! Since you're right in there, you also have a good sense of what you want accomplished, so steering is easy.
Getting small suggestions in agent mode (I've been using Cursor) is getting better. This might be a function definition, or maybe a small ui component. Still 50/50 on making edits, but I still review every line before accepting a diff.
Big 'implement this feature' requests are... okay.

This is where I'll pop out of the list to expand into Story Time.

The anecdote that's prompting this post was that we were implementing Speech-To-Text with Deepgram for voice input in our app. This is new territory for me, and something that I'd never done.

So I loaded up the docs into context, asked the agent 'Build it, please', and figured I'd give it a shot.

Off it went, churning away. The file explorer was filling up with new files and when I clicked into one of them, it was just a wall of green highlighted text.

Of course, I wasn't reading all that. Accept.

I panned over to the app - everything compiled. I was impressed nothing broke!

I clicked on the mic button, accepted browser permissions, and I saw the live transcription kick off. Now, I was blown away.

hold on

Then, I noticed something curious... the transcription was only one word long. This happened repeatedly, and then I knew that I would have to dive into the massive diff to figure out what was going on. To figure this out, there were two things I 'needed to know':

How to debug

Problem-solving is a skill, and bug-hunting in a foreign codebase (which an agent-written diff might as well be) is a niche application of it. In this case, I could see from the network requests that I was looking for some kind of websocket implementation... I wasn't sure if the bug was happening on the frontend or the backend, so I liberally applied logging to try to trace my way through the code flow. Eventually I settled on it being an issue with the websockets setup in the backend.

Some very loose systems design knowledge

I've not implemented websockets in a project myself at this point. Whenever I've been on a project that called for it, it had already been working for some time, and I was just building on top of it. Moreover, the logging I had said that pretty much everything was working - spinning up, authentication, relaying audio to the provider, and text back... This was confirmed by what I saw on the frontend: the single-word transcription was evidence that it was working, at least a little bit.

Then, I noticed something: the socket server initialization was happening in the same block as the message processing - which meant that with every audio chunk that came through, the sockets would reconnect with a new session and orphan the frontend app (but not until sending the first successful chunk)! Again, I'm not a websockets expert, but I'm pretty sure the whole point is that the backend has to maintain connections, not spin up/down constantly.

From here, I was able to guide the code agent towards a solution that got the system working.

My main takeaway

Thinking about this experience, I find it hard to reconcile it with the hype around "anyone coding anything" with these agents. I can't imagine someone with zero coding experience trying to debug the situation above. To start, it's a lot easier to drill down to the problem area if you're comfortable spelunking around a codebase. Similarly, without knowing how to nudge the agent in a particular direction, everything pointed towards the code working as intended: auth was succeeding and the connections were being made. "Just fix it" as a prompt probably wouldn't go very well/far.

It also makes me think about if/when I'll ask an agent to design something for which I have zero domain knowledge... How will I know if the solution is a good one? How will I correct it if it isn't?

These coding agents are really good, but for now, you still need to know what you're doing.

hold on​

My main takeaway​

hold on

My main takeaway