The Never-Ending P1

Don’t worry, this diagram is explained in thousands of pages of documentation

As a computer systems administrator at a large organization I was introduced to a library of procedures and best practices for managing technology services called ITIL that originated in the mainframe era. ITIL is huge, complex, unwieldy, and bureaucratic. It is also a brilliant and incredibly useful resource for understanding how to manage and optimize large systems. And I am very much a fan of optimizing large systems. It’s sort of my jam.

I mention this because at a contention and stressful meeting a few days ago I was accused of being a “process person,” the implication being that I was out of touch with the reality of the situation. The situation being how to make reasonable business decisions during a global pandemic of a scale unprecedented in the last century.

Our business is keeping our community healthy and safe through effective cleaning, and that has never been more important than right now. There has been a huge surge in demand for a scarce pool of cleaning and personal protection supplies. Every day we are overcoming countless challenges to source and stock products, to manufacture disinfectants and other chemicals, to take orders, manage expectations, and to make deliveries, all while ensuring that our staff are kept safe from unnecessary risk.

In the ITIL model, incidents are classified on a scale based on urgency and impact. Our business (and our society as a whole) is facing a priority 1 incident, and, as a senior Systems Engineer, I am very familiar with P1s.

In ITIL, service requests are prioritized based on urgency and impact

How do you typically handle a P1? First you wake up, because it is probably the middle of the night. You push away the grogginess and embrace the adrenaline rush. You assess the situation, triage as best you can, then figure out who else needs to be woken up and how to brief them. Next you work to analyze the problem, determine the corrective action, and implement the fix. Then you monitor the result.

It is rare in the sysadmin world for a P1 to last more than a few hours, or at most a few days. But dealing with the business impacts of coronavirus feels like a P1 that never ends. I’ve been on that groggy/adrenaline combination for a couple weeks now. I can’t sleep at night, and I can’t stop thinking about all the things we need to do.

When I moved to Maintex three years ago and started learning about the many aspects of running a business, it quickly became clear to me that process could solve a lot of problems, but also that my biggest problem was understanding all the things that cannot be easily formalized into a process. The physical world is rife with variables a technologist would not expect.

The physical world is rife with variables a technologist would not expect.

Fifty thousand bottles from a trusted supplier are fine, then suddenly a few hundred start leaking inexplicably. A step is missed in a sanitizing procedure and an entire 5,000 gallon batch of product is contaminated. A forklift driver accidentally smashes a fire sprinkler, flooding the loading dock. A shipping company returns an entire trailer full of product because a labeling machine with a dirty optical sensor placed a regulatory label two inches askew. An inspector with an imperfect understanding of building codes delays a project by 3 months before inexplicably approving it.

ITIL is a helpful framework for me because most other business processes are designed for incremental improvement, not crisis management. But a process is only as good as the people who implement it.

In the physical world, and in the world of people, the unexpected is routine, and no amount of checklists or procedures can account for every possible variation. So I have spent three years implementing systems and analyzing data, sure, but also learning what it means to manage an organization made up of people.

Attempting to maintain business continuity during a pandemic is like a P1 incident that never ends. And that means it is a problem I cannot just solve and then go back to sleep. Every day we need to make decisions and trade-offs that are uncomfortable and might not be the right ones, simply because a decision has to be made so we can keep moving forward.

A lot of my job is to make space so that the smart, hard-working people that work for me can do their jobs.

But most importantly for me as a manager, this P1 requires stepping back. Taking a breath. Checking in with staff to make sure they are okay. Listening to their concerns and figuring out how to help them to clear obstacles. A lot of my job is to make space so that the smart, hard-working people that work for me can do their jobs.

And that is decidedly outside my comfort zone.

But I’m finding it just as exhilarating and even more challenging than doing things on my own. Most of all, dealing with this crisis, and working with this team feels more consequential and important to me than any IT incident I’ve ever participated in. The coronavirus is an incredibly daunting challenge for everyone, and the effects are rippling across society and business in ways that are scary and uncertain. The best we can hope to do in our little corner of the economy is continue to conscientiously perform the valuable service of delivering critical cleaning and safety supplies where they are most needed.

I’m proud of Maintex, and of our team. I’m proud of the work we are doing to help keep society resilient in the face of an unprecedented crisis. And I’m more sure than ever that leaving my former career to start over was the right decision.