How Can You Protect Your Business From IT Outages?

Recently, I read a customers well defined analysis of a potentially critical IT problem. A problem which is regularly overlooked, difficult to describe, and even if it is described, often denied by many organisations. Here is a precis, it outlines the issues perfectly.

“XYZ company has identified a need to review and document existing IT infrastructure and processes and to produce a review document. They do not have a unified detailed complete picture, or documentation of their entire IT estate. Whilst a small number of people understand individual elements of the IT Infrastructure, and external partners understand their portion of the infrastructure, there is no one unified detailed solution design that captures and documents the entire IT estate. Every aspect is just a partial view”

In other words, no one has a complete picture of the environment.

It is worth noting however that even if they did have this total view it is likely that the picture of their entire IT infrastructure estate would be unmanageable and simply too complex to understand So just having the information is not enough on its own.

However, the problem analysis is well put, and It continues along these lines, eventually realising why this is so important and why it is worth going through the process of creating it.

“Certain users may have knowledge on how an application is configured, some internal users may have admin rights on some systems. External partners offer support on these, but there is no unified document, individual or group, that contains or holds all the details and information on the infrastructure, processes and maintenance required to keep the infrastructure running” ……. and by extension support the critical business services. 

Here we have it..

If you don’t have good documentation you are sitting on unknown risks to your business and your customers. We see examples of this every month in the media. Preventable Systems failures. Sometimes these outages are just a nuisance, like the cash machine being out of use for an hour or two, Barclays customer accounts showing  00.00 as the balance etc.

But for many critical systems it could be very dangerous. Think nuclear power stations, utilities, transport systems, telecoms, hospitals.

This is why we do what we do, we know it is important work.

Returning to our customers analysis.

“Collectively there may be some knowledge captured on what exists, but there is little or no detail or knowledge capturing “why” parts of the infrastructure was deployed and developed and what roles and functions (if any) they now perform. We realise there are important dependencies and relationships between certain parts of the IT infrastructure and our business which need to be identified and mapped to ensure continuity, mitigate against risk from unplanned and even planned changes and unforeseen outages. With a complete picture we may also be able to reduce costs by identifying and safely decommissioning equipment and connectivity costs.”

So, having articulated and defined the issues and the advantages of tackling them, how do you go about creating this mythical single source of truth?

How do you create a baseline to describe your infrastructure, so you can then work out potential impacts and dependencies?

What information do you need?

What data do you already have that you can really trust, given that a lot of information resides within individuals, separate teams, multiple spreadsheets and (if you are lucky) potentially DCIM tools and CMDB tools.

Most of which have not necessarily been updated, version controlled or even audited to ensure consistency of approach to naming and designating entities. For us, here at AssetGen, accurate data must be the starting point. Discovery tools can be useful but are limited in being able to give much of the detail needed. At some point, a manual audit is something which is unavoidable, but well worth doing.

But what is the real aim here, the crucial major objective? Surely as we have indicated, obtaining all the data although necessary, is not in fact sufficient to be genuinely useful. Why not? Well because as previously mentioned it would be far too complex to comprehend. It follows therefore that once gathered the data needs to in a place where it can be organised, such that it can be filtered to produce specific views of particular areas and connections, dependencies and potential impacts.

This is what we do. This is our objective.

Our approach at Square Mile Systems is to place the data into our own software which runs an SQL database. This database can then be filtered to output reports, plus diagrams that can represent the data visually using Microsoft Visio.

The crucial point is that now users can generate with precision the exact view they require. This can be from a technical point of view showing servers, IP addresses and network routing, but importantly it can also be from a business point of view. We call this a bottom up (base infrastructure looking up to business services and applications) and a top down approach.

The top down or high level view, maps the; Business services, systems and functions looking down through the infrastructure in fine detail to the;

  • Software applications/databases /servers/ virtual servers /clusters/ blade servers 
  • Storage/ switches/ routers/ patch panels/ equipment cabinets

Then we capture the data centre infrastructure including location of the DC, individual switch rooms, floor tile references of cabinets, backbone connections, power and cooling, that really do support these high-level business services. All parts of this infrastructure, if undocumented pose potential risks to the continuity of business services.

In fairness, this approach seems obvious enough, but, having this level of accurate detail is very hard to obtain in practice. Developing a holistic top down service-oriented view is a challenge. One of the problems is the different ways in which the same thing can be described in multiple or inconsistent ways.

For a database to be able to figure out, look up, find, associate and output any object, it needs to know where it is and what is called. It needs to know about attributes or properties that have been ascribed which uniquely identifies it and relates it to other objects. Essentially it requires a well thought out and structured naming convention. This must be deliberately worked on and agreed before any data will be searchable within a data base.

None of this is rocket science. It is common sense based around ITIL and other best practice methods which are recognised by mature IT professionals. Using, mandating and updating the database is a strategic decision and not a magic wand approach.

We help our customers achieve this, but it is often a long-term collaboration and partnership involving education, the transfer of key skills, methods and of course trust. The process itself requires sponsorship, executive buy in, ownership, leadership, regular communication with all stake holders and an unwavering commitment to maintain it. It is possible to do this. It is important. This is why we do what we do.

Jonathan Phillips 

Square Mile Systems


More Posts