Skip to main content

· 5 min read

An engineering manager asked me, "What do you do when you notice that your team is shipping more and more bugs?"

"More and more of the work that's going out is coming back with regressions, with user complaints, with crashes, and the like. What do you do in that situation?"

As with most advice, the answer is it depends. You have to figure out what the underlying reasons could be, as well as the context in which they are happening. What we do know, is that humans are going to make mistakes. Programmers are going to ship bugs.

The first thing that's notable to me about the scenario is that there is likely a perception from leadership (or anyone above our protagonist) that there are increased bugs and they're more severe and people are complaining and so now it becomes a critical issue that needs to be investigated by the engineering manager. Fix your team!

Let's dive in.

Where do we start?

So if we take it as a understood that people are going to make mistakes, and that bugs are currently in your production code, then where can we look to minimize the rate of occurence of these bugs? Here are a few I can think of:

  • Gaps in testing and review: Most companies have automated tests, staging environments, QA, and so on. So these bugs are getting through each step!
  • Shipping culture shift: I've found that teams tend to ebb and flow between "just ship it" and "everything must be perfect". Maybe the team (and company) is in the former mode.

Process

Most companies have a code review process. At places I've worked, it looks fairly standard. Someone makes a change, someone different reviews it, approves it, and then it can go into main if the test suite passes. Usually it then gets deployed to a preview environment, where folks poke at it until it's deemed fit for production. At this point, take a step back and consider: if a bug getting through all of those layers, I don't think you can blame any one person or step.

note

Well... As the EM, it's your responsibility to own it and to figure out how to improve the situation. So you might end up taking the heat here. ...Congrats?

Anyways, tactically it is very tempting to focus on each piece of the process. You might actually have major strides to gain there! Don't have a test suite at all? Adding one will help. Have one, but it's always flaky/red? Getting it to green will help. Etc. So definitely evaluate the process.

People

Another aspect to investigate is the culture, pressures and health of the team. The human side of our development equation. This can be overlooked by the clear solutions we can make on the process side!

I'll start with pressures, since it might be the easiest to identify. You may have experienced some flavour of this: "Yeah, ordinarily we would [fix this first | test it more | etc] but we're shipping this because [of our deadline | we're going to land a client | etc]. Let's call it a 'known issue', but we'll come back to it later". Most often this is culture set from the top-down during sales pushes.

Aside

This behaviour accrues known technical debt, and sometimes, that's super valid. You don't want to be perfect to roll something out. That doesn't make sense - you'll never get anywhere. On the flipside... going to far on the other end might lead to the titular situation.

Now, sometimes this ship now, worry later vibe is actually the culture of the team. You might find this in younger companies, where iteration speed and the short feedback loop is paramount. For a company trying to mature as an engineering organization, this mentality can be hard to shake. Determine if your team's current culture is a product of one they've been used to, and if a tone shift is required.

Finally, reflect on the team's interpersonal relationships and role dynamics. Perhaps a senior engineer is getting their changes greenlit by juniors who are afraid to bother or challenge them. Someone could be letting a team priority slide while they work on an initiative that benefits them personally. In the worst (beyond rare, in my experience) case, someone could be actively trying to drive another person off the team by letting their bugs through.

Summary

Ultimately, to generalize the problem, as an engineering manager you're trying to figure out:

  • what the culture and health of your team is, and
  • what processes exist to support them and what can be improved

in this case, within the context of code quality and delivery.

If you zoom out, From there, you can form a plan of action, and do the necessary expectation management and communication in every direction. This is a process that you can (and should) repeat over and over!