This article:

Introduces the Four Grand Challenges of IT;

  1. Failure Modes,
  2. Threat Risk,
  3. Technology Obsolescence,
  4. Backlog Management.

1,700 words, estimated reading time 16 minutes.

Recently, I was describing the technical architecture for a new IoT system. This was one of those, “let’s completely rethink this so it can last for a generation” projects. Which is when you think not just about project goals, technologies, and scope, but also squishier things like evolving user experience and technical architecture principles. And like any good architecture, it must consider not only “what” will be built but “how” it will be built and operated.

It was during this process that it dawned on me that the big themes I needed to address I have seen dozens of times over decades of work. These themes are never really “solved”; they are “addressed”. This is where the CIO spends all of his/her time and where the IT organization is considered a success or failure.
These are the “grand challenges” of IT.
With such a name, the bar has to be pretty high. I would define an IT Grand Challenge as having some specific characteristics.

An IT Grand Challenge is a problem that is:

  • Difficult to solve,
  • Persistent and
  • Important

Based on my experience, these are the Four Grand Challenges of IT:

  1. Failure modes
  2. Threat risk
  3. Technology obsolescence
  4. Backlog

In this article, I will opine on what the Challenge is about and some ways of addressing it, but since these are “grand”, this article will be only a gloss. (Maybe there’s a book idea here.)

Failure modes

As I’m sure you’re aware, stuff breaks. And as much as I’d like to have a philosophical discussion as to why that is (increasing entropy of the universe and all that), Steven Hawking summed it up quite nicely:

“For any system, there are more failure states than functional states.“

So, the odds are not in your favor.

And I have some more bad news: you are under-counting your failures. Hard failures are easy to count: system crashes reported in your operations center, problems reported by users and tracked in your system, etc.

What you are under-counting is “soft failures“. For example, that one conference room that gives users WiFi issues 15% of the time. And, that application that creates confusion for many users resulting in a slow-down in their activities. And, incorrect information on a recurring basis. That means failure modes span hard failures, unreported “soft failures” and then there are undetected failures.

So, what to do?

First of all, a good architecture / design is good. More realistically, it is a “re-architecture” since you already have something plus you need a transition plan with a commitment.Next, do less. This one is tricky. Apple does less by providing fewer customization possibilities. That’s doing less. But it’s actually more because in order to pull that off, the complexity is hidden. What “doing less” does do is provide more predictability, which brings me to my next principle.Humans make messes. Create a culture that rewards identifying and logging soft failures. That poor Wifi coverage may not be a hard issue, but it drives IT’s reputation and peoples’ perceptions that the technology is failing.

Lastly, think in terms of resiliency not just availability. As enterprise architects, we are trained to create higher availability through redundancy and other approaches. Your border router can go down? Fine, just add a second one in case the first one goes out. That’s redundancy for availability. Resiliency is the idea that some key things need to be done in completely different ways. Need to get critical data from Point A to Point B? Do a RESTful call and FTP the file over. If one approach fails, the other will likely not be impacted.

Failure is endemic in any system. How you prevent it when possible and how you address it when necessary, becomes IT’s success or failure.

Threat risk

The previous section was about how technology fails. This section is about how humans fail.

When humans and technology meet, it is usually a beautiful thing. However, sometimes – through malicious intent or negligence – humans can make technology fail to live up to our expectations. A network team leaves a port open, a tech support representative divulges a key sequence, a social AI connects bad actors together; this and a whole lot of other things could happen and this is what keeps the CIO, and CEO, up at night. </P.The problem, however, is that the CIO is worried about the “black swan” event – high impact, low probably – when in fact, it is the more pedestrian events like your intellectual property goes out the door with your ex-employee that is more of the threat.

So, my first suggestion is to understand the difference between “theater” and practical measures. My favorite piece of security theater is requiring elaborate passwords. The good news is that no human or robot alive will guess the elaborate password in the five attempts the system allows before the system locks them out anyway. The bad news is that because the password is now so elaborate and must change every 60 days, the user writes it on a Post-it and sticks it under the keyboard!

Relating a more personal example, I wanted to change an address on a bank account. After verifying my social security number, validating two recent transactions and answering five security questions, they changed my address. Then sent an email to verify that they did it. Seriously? The threat surface actually increased.

You now have a lot of personal information about me you don’t need thus increasing your exposure and mine if it gets into the wrong hands. And, you recorded the conversation which could get into the wrong hands. Good theater though!

Lots of suggestions here.

  • Don’t get information you don’t need.
  • Store sensitive information in one place and tokenize.
  • Be aware of backups and recordings (and, yikes!, Hadoop).
  • Compartmentalize storage and transmission of data and what roles handle it.
  • Treat your “knowledge” data (email, Powerpoints, etc) like customer data.
  • Do periodic threat assessments like penetration tests of your networks (plural) but also do behavioral assessments of key parts of your organization. (Yes, think of yourself as the NSA.)
  • And most importantly, design security and privacy in from the beginning of a new system or it’s re-design.

Technology obsolescence

For a lot of CIOs I know, technology obsolescence looms large in their set of concerns. These CIOs have gone from a vendor focus on technology; “keep the number of different tech vendors manageable so the vendors can manage the technology life-cycle”. To a platform focus; “keep the number of different technologies manageable and pray the vendor supports future needs”. To a “best-of-breed” focus; “Get the best technology for the current requirement and don’t worry about the vendors.”

The current favorite approach changes every few years. I think we’re still on the platform phase of the cycle.

So, what’s the right answer? All of the above.

Unfortunately, this is one of the reasons why this counts as a grand challenge. This becomes even more difficult when the entire question becomes the wrong question. In the 2000s, CIOs were asking Windows vs. Linux. Wrong question. On-prem vs. cloud was much more important. And it still is.

Some suggestions here.

Register all technology on a “technology life-cycle” – from introduction to retirement – and follow your life-cycle. Don’t bring in new technology unless necessary. Just because something performs 10% better doesn’t mean you need it! It’s okay to sandbox new technologies so you learn as an organization but don’t put it in production until it is necessary.

Do a “future technology” plan. And please, don’t give this to yourself to do or your senior staff. Give to who’s most qualified. Vendors can provide some data inputs — but don’t let them drive. Since you have a resilient architecture (see above), embrace two very different technologies sitting next to each other. Grandma can sit next to the kids. Really, it’s okay.

Lastly, have a well thought out buy vs. build strategy. This is a more complicated set of questions and beyond the current scope of the article, but you need to have a strategy.

Backlog Management

Ah the good news. Lots of work. And some bad news. Lots of work. In fact, more work than can ever get done.

And more bad news. The backlog of work, continues to increase every day and in every way. That’s the real problem.

Some people would see that as a good thing. Having too much to do means you have to focus on what’s really important. That’s the beauty of methodologies like Agile. You focus on priorities and what is necessary.

What can you do? If you’re an IT executive, your job is, among other things, to manage the backlog. However, the more important duty is managing the size of the backlog.

Ideally, the backlog gets smaller but it’s okay if it stays the same. The problem is when it continues to increase in size, which is the natural order of things. At some point, you can no longer do the “important” priorities and only have resources for the “very important”.

What happens to the merely important? Those don’t get done and the business suffers, or commonly, the business creates a shadow IT group to get what they want. And then hand you the technology to maintain anyway.

Other suggestions. Make it look worse before it looks better. Specifically, make sure you really have everything that is backlog, recorded in the backlog. A common omission, for example, is the effort and money to retire technologies or re-platform.

Next draw a hard line to separate items which move the business forward (like supporting a new product) vs. maintaining your current estate. Then within those two categories, prioritize but with the participation and buy-in of impacted parties. Wear an apron because it’s going to get messy. Start at the top of the list – the most important – and go down the list until you can no longer have any hope of delivering and draw a line. Above the line, you can and will do (eventually) deliver. Below, nope. That’ll be a great conversation to have the CFO and/or CEO.

What do you do next?

My humble suggestion is that if you are a CIO or CTO, stop solving problems. You heard right. If problems are solvable, I’d suggest your staff will do a fine job. Instead, address the unsolvables. Start with the Grand Challenges and add one or two of your own. You’ll earn that CxO slot and the sense of satisfaction that comes with climbing a mountain.

About the Author

Tahl Milburn

Tahl Milburn

Consulting CTO

Tahl has cultivated a diverse career in technology ranging from CTO of a Fortune 500 company (Providian Financial) to founder, CTO and hands-on developer for IoT healthcare startup Lifestate.io.

As a Managing Director of a consulting practice at Cisco, Tahl delivered results across nine counties and multiple industries including financial services, healthcare, energy, federal clients and others. His deep expertise includes Enterprise Architecture (VP at Visa), IT governance and resourcing, organizational change management and technical strategies, having delivered, for example, information security strategies to three Fortune 500 companies.

His expertise and passion includes advising on emerging technologies including IoT, low-code, and, affective computing.