4 min read

Oh God, It Has Two Shields

Oh God, It Has Two Shields
Not as terrifying as some enemies, but very appropriate for this post. All credit to From Software

I am firmly of the opinion that a software engineering team who builds a service or product should also be responsible for operating that service or product in production.

But supporting a piece of software and creating a piece of software are two entirely different skillsets, so it can make sense to separate the responsibilities, with some people in the team favouring one or the other.

At Atlassian that's exactly what some teams do, with the Shield Engineer role.

But sometimes you need more than just one.

Hey Look, A Wall

Back in September last year I had hired my first Shield Engineer and I was waiting with bated breath for them to start.

Their main objective was to take responsibility for a sizable chunk of the disturbed load, which would free up the developers to be able to focus on project delivery.

It's been over three months now and I can honestly say that I have no regrets at all. The Shield Engineer has done an amazing job of learning a complicated domain, they were a key player in an operationally focused project, and they have made a meaningful difference in the health and wellbeing of the developers in the team by ablating the majority of the disturbed load.

The best part is that I don't even think any of the risks that I identified were realised:

  • I don't see the developers throwing things over the fence. The Shield Engineer is embedded directly within the team, attending all of the rituals, so any pain they are experiencing as a result of shirking development responsibilities is immediately apparent
  • I don't see any gatekeeping. The team works together to understand when and how things should happen, and that includes the Shield Engineer. They offer a consistent operational perspective to things and the developers respect that
  • I don't see any overload. We've very specifically not made the Shield Engineer 100% responsible for all disturbed. There is still a roster, and the developers still participate in the roster, just at about 1/2 the rate that they used to. Both parties are happy with this arrangement
  • I don't see any lack of empathy. Like I said in the first point, the Shield Engineer is right there, every day, sharing their experiences and reflections. There is no way to ignore the suffering of another person if it's right there in your face

Great success, congratulations to all involved, etc, etc.

But we have more disturbed load than can be handled by a single person, no matter how awesome they are.

Did That Wall Just Move?

I work within a small part of Atlassian known as Shard Management. We're all about helping service owners run their services in production, specifically from the perspective of capacity management.

I share the responsibility for leading Shard Management with another Engineering Manager and it contains ~26 software engineers of various levels of seniority.

In order to minimise cognitive load, we've broken those engineers down into three distinct squads:

  • One focused on the migrations domain (i.e. moving tenants around)
  • One on the capacity domain (i.e. making sure there is enough space for tenants)
  • One on the collections domain (i.e. collecting, aggregating and synthesising relevant tenant metrics)

Our very first Shield Engineer joined the squad responsible for the capacity domain, which accounts for ~50% of all disturbed in Shard Management. In turn, they've managed to take on a bit over half of that load themselves, with the rest being distributed amongst the developers in that squad.

That still leaves a lot of disturbed being done by developers though, especially in the migrations squad.

So, I hired another Shield Engineer.

Well, technically, having two Shield Engineers was always the plan, but it took me about three months to hire the second one, because good Shield Engineers are rare. It's not a common skillset nor is it a well understood role within the market.

Our second Shield Engineer just started, literally a week or two ago, but the plan is for them to follow a similar journey to the first one, except in a different squad.

It's Coming Right For Me!

If I just left it there, with two Shield Engineers, each focused on their own squads, with developers covering the balance, that would still be a good outcome.

But it's still less than ideal, because:

  • It means developers are spending time on disturbed, which most of them hate with the fire of a thousand suns. Also, see previous comments about it being a different skillset that not every engineer has
  • It creates kind of a single point of failure within those squads, especially as the Shield Engineer becomes more and more capable of doing disturbed in isolation. Which they will do even if I tell them not to because they want to be helpful
  • It means different squads might develop different disturbed experiences, which makes it harder to get overarching intelligence and also may mean different customer experiences. This has already happened a bit, even prior to the second Shield Engineer, so I want to reign it in before it diverges too much more

To solve all of those issues, the plan is to create a small team of Shield Engineers who will own disturbed for the entirety of Shard Management.

Such a team should:

  • Be responsible for 90%+ of the disturbed load, escalating to developers as necessary via an on-call roster (which is still needed for other reasons anyway)
  • Be robust in the face of a single Shield Engineer getting sick or going on holidays (though, still reliant on developers to fill gaps)
  • Be able to create a consistent disturbed experience across the mini-org by offering a single entry point for the internal customers to engage with

Two should be enough, assuming that the developers maintain the capability to fill in as necessary if the situation warrants it, which it will, because we don't want the Shield Engineers to doing nothing but the operational stuff. They should also have time to analyse patterns, propose improvements and make their own lives better.

So, maybe a third Shield Engineer might be warranted after all. To flesh out the team and create additional redundancy and space.

Only time will tell.

Two Shields Are A Force To Be Reckoned With

Given my experience with Shield Engineers to date, I'm optimistic about the future. Perhaps I just happened to find a really good one the first time around and it will all come crashing down, but I very much doubt that.

It really does reinforce that finding people who want to do a role instead of forcing people to do a role is a much better way of actually getting the outcomes that you want.

Funny that.