Image by Author using Midjourney

Tending towards entropy — keeping track of tech (or why your startup should have a service catalogue)

Mark Ridley
13 min readOct 5, 2023

--

This is part of my CTO-Ops series; articles written to help CTOs and techies in leadership positions address already solved but recurring challenges.

OK, OK. I know a ‘service catalogue’ doesn’t sound like much fun. It’s not as catchy as an ‘MVP’ or ‘continuous discovery’. It’s not serverless or machine learning. But a service catalogue is a simple thing that really helps to keep your tech estate healthy as you grow.

Service Catalogues (yes, ‘catalogs’ for my American English speaking friends) aren’t just useful for service desks in big companies. If you’re in a small team, perhaps a tech founder or an engineer in a startup, starting your first service catalogue is pretty quick and effortless.

I’m going to explain why you need one, how to build it from scratch and how you can evolve it even further as you scale. You’ll find out how a quick and simple spreadsheet can help you avoid a lot of future stress.

Forgotten systems, unowned code and unpaid bills

In each of the last three businesses I worked with, we ended up creating a service catalogue from scratch. Even though each had slightly different challenges, getting visibility of our tech estate helped us plan and prioritise the big jobs we needed to take care of.

In one company we had to undertake a huge tech migration, but found that there was a massive amount of legacy tech that nobody was responsible for. The top priority in this huge, urgent project was to find the right people to take ownership of poorly managed, often forgotten bits of tech. After years of sustained growth, the original knowledge that had been locked away in a close-knit founding tech team had evaporated, leaving the company with code, services and systems that had no owners and little documentation, and it all had to be moved. For the teams that inherited this sprawling legacy, it was hard to know where to start. If you’ve ever inherited old code, I’m sure you can imagine how it feels to be handed someone else’s old, broken stuff and being told to look after it.

In another business, we needed to try to separate out what jobs were done by which teams. As a relatively young startup, growth had been organic and it really wasn’t clear exactly who was responsible for which bit of tech. Who gave access to the cloud via the VPN? Who created Office 365 accounts? Who removed permissions from different internal tools when someone left the business? It just about worked, but only because of the good will and best endeavours of a very dedicated team.

In yet another business, we needed to put in place the first ever out-of-hours on-call rota — deciding who it was that got woken up at 3am when an alert was fired (or at least stayed sober over the weekend). But to decide who responds to an urgent incident we needed to know clearly who was responsible for the thing that broke. Previously, all out of hours support had been carried out by one, impressively dedicated, techy who had finally moved on from the business. Without them, it wasn’t clear who could, and even who should be fixing these overnight problems.

In each of these situations, a simple exercise to create a living document — the service catalogue was the first step on getting control of the disorder and smoothing our future operations.

Just what is a service catalogue?

A service catalogue is basically a big list of the technical systems and services in your organisation. That’s it.

The purpose of the service catalogue is to provide one view of all the technical systems and services in our organisation (even if they’re not owned by ‘tech’), and serve as one source of truth that we can consult to find out who to contact if something goes wrong. While this is easy when we have 10 or even 20 systems, it’s scary just how quickly even brand new startups can accumulate 50 or a hundred systems that need managing. The service catalogue helps us manage this as we grow.

Service Catalogues have their origins in what you might think of as old-fashioned IT management and frameworks like ITIL, but don’t let that put you off. Like a lot of these serious frameworks, ITIL contains a lot of goodness, but it can be a bit overwhelming for small teams so we’re going to start with the basics. It’s enough for now that you know that we’re simplifying from a much bigger and more rigorous definition of service management.

There are lots of vendors who sell clever service management solutions, but as always we’re going to follow the golden rule. We want to create things that are “simple and working”, and we will only add complexity when we need to. That means that all we need for our first catalogue is a spreadsheet. Excel or Google Sheets, choose your poison.

How do you build a Service Catalogue?

We’re going to start small. First, create a new sheet, or copy my Google Sheets template from here.

We’re going to start with just a few columns:

Service Name | Supplier | Category | Business Owner | Technical Owner | Service Tier | Link to docs

That’s it. We can add more later, but this will be fine to get started with.

Let’s break down what each of these columns is there to do, and what you should fill it with.

Service Name

A short and descriptive name for the service that is unique and understandable

Supplier

The company that is selling you the product. This is important because you may have multiple services from one supplier — think Microsoft 365 and Azure.

Category

We’re going to make some basic categories to batch up similar services for analysis and reporting. Categories help us answer questions like, “What do we spend all our money on” and “Why do we need so many people in tech?”. You should pick categories that align with your business, but don’t have too many and keep them broad, you can always add more later if you need them. Here’s some examples to help get you started:

Basic categories include Collaboration Tools, Infrastructure, End User Experience and Security Services

Business Owner

This is the person you can chase to find out why we even need the service. They’re probably the person that wanted it in the first place (but doesn’t manage any of the support having bought it). They’re also the person you need to tell if it stops working and you’re the technical owner. The business owner is going to take responsibility for defending the business case for a service, for explaining what it’s used for, why it’s important and why we should continue paying for it.

Technical Owner / Team

Ideally we want to subscribe to a ‘if you build it, you run it’ philosophy, so if this is something home-built you will name the team who owns this service. If it’s bought externally — for instance a SaaS offering that is part of your tech stack or business tools, it should still have an internal technical owner — usually a team, sometimes a role (eg the CTO). The technical owner is the team/individual who gets alerts if the service is broken and responds to issues.

Service Tier

We’ll take some time to talk about the service tier later, but I usually recommend that for first time service catalogues we start with two simple tiers — Gold and Bronze. In simple terms, Gold tier systems are systems that we’ll get up to fix in the middle of the night, and Bronze systems are fixed in normal business hours. You may well also have a ‘do nothing’ tier, which is exactly that — if it breaks, no-one will do anything about it.

What goes into the catalogue?

Now you’ve got your spreadsheet and you know what’s going in it, let’s start populating it.

The first thing we want to answer is “what systems should I be listing”. There’s no perfect answer to this question, but I’d suggest thinking about these questions. If the answer is ‘yes’, you should probably have it in the catalogue:

  • Someone is going to shout if this breaks
  • We pay for this
  • We will be asked about this in a penetration test or due diligence
  • We built this ourselves and it’s quite important

It’s easier to think about this when you have some examples. My typical list would probably have some or all of these on:

Google Workspace, Github, Jira, Slack, Hubspot, Circle CI, SonarQube, Snowflake, Amazon AWS, Looker, An internal dashboard, An internal tool to manage customers…

Populating the catalogue

Now, knowing what type of things we’re looking for, let’s start populating the catalogue. The easiest place to start is just the top of your head. In the ‘Service Name’ column, start listing out as many services as you can think of. Don’t worry about the other columns yet, just try to get down a good list of the tools you use.

Once you’ve done this, another good place to look for inspiration is the company credit card bill. Ask for a list of card charges, or suppliers whose invoices are paid on the finance system or, if you’re lucky enough to have a procurement system, ask for a report on ‘tech’ systems. Don’t forget to include other systems which are important to the business, like HR tools and finance systems, even if they’re not specifically within the tech domain. If you’re likely to get told about them breaking, get them into the catalogue.

Deciding on owners

Figuring out owners can be a tricky exercise. Getting people to agree to be owners is an even trickier one, but it’s one of the most important parts of this process.

Remember, we have business owners and technical owners. With your big list of systems, start going through the list and do a first pass on both business and technical owners. For some, this will be easy: Maybe Hubspot is owned by the Head of Marketing. Perhaps the business owner of Microsoft 365 is the CTO, and the technical owner is the Tech Support team. Fill in obvious answers, and leave blanks where you have questions.

Pretty soon, you’ll have a list that is starting to resemble a real service catalogue, but the easy parts are done. Now you need to fill in every blank owner field: who decided that some SaaS tool was vital for the business? They’re probably the business owner, but you have to take time to communicate that to them. You will have to ensure that they know which systems they are the business owner for, and that they have a responsibility to be able to explain why that system is important to the business.

You will probably also find a number of systems that have no obvious tech owner and here’s your biggest challenge. Every system needs a named tech owner — whether it’s a role or a team — and that owner is responsible if ever that service breaks.

This stage of building your service catalogue will probably bring to light a lot of concerning oversights. It’s likely that there are important systems that haven’t had any real support for quite some time. There are probably unimportant systems that are still being paid for, but no-one knew about. It’s this part of the process — deciding on and making ownership public — that really drains the swamp.

Service Tiers

One of the benefits of building your service catalogue is being able to better understand and demonstrate which services need special support. It can be very helpful to use a simplified model for availability and support to help your business understand not only which services are important, but also what your commitment to support those services is.

In ‘proper’ IT Service Management, there are a lot of acronyms which have specific definitions that express how critical, or how much support a service needs. Let’s take a quick look at what these are to give us a grounding in the discipline.

SLAs and OLAs: Service Level Agreements are usually contractual obligations to clients that define the expected availability of a service. It might contain metrics like uptime, response time and performance targets, as well as explaining escalation processes if the service breaks. An Operating Level Agreement is similar, but usually relates to services provided to internal teams.

RTO and RPO: stalwarts of disaster recovery and business continuity metrics, RTO and RPO are much less familiar to teams these days, but are very useful expressions of availability. The Recovery Time Objective is the target time within which a service should be restored if it fails. It essentially answers the question, “How long can our system or application be down without causing significant harm to the business?”. An RTO might be an hour or a week, depending on the system.

Recovery Point Objective relates to data loss. The RPO is a target for how much data it is acceptable to lose — typically defined by the length of time between your backups. It answers the question, “How much data can we afford to lose before it becomes detrimental to our business?”

There are many more formal measures of availability and performance which likely won’t be defined for all of your systems. Instead, we’re going to use some simple rule based categories that will lump similarly important services together.

The reason that we do this is that it can be a very time consuming and detailed process to define full availability targets for all of your systems. Instead, we’re going to focus on ‘simple and working’ and we’ll add that complexity later if we need to.

I typically recommend that teams building their first catalogue define only two formal categories for support, Bronze and Gold. There are no set definitions for these categories, but here’s my usual starting point:

Gold Systems: are systems which, if broken, will materially impact the business. If we determine that we have any Gold systems, we will often ensure that we have 24x7, 365 day support for them. If one of these systems breaks, we will ensure that a monitor picks it up, an alert is sent and somebody wakes up to fix it (no matter whether it’s just after lunch, or Sunday morning).

Bronze Systems: if we’re working with our two tier system, Bronze systems are any that are supported during normal working hours. We may set an expectation of how quickly we’ll respond, but if it’s after business hours it will be fixed the next day.

Do nothing tier: you might pick a friendlier name, but it’s important to highlight that there may be systems which will have no support at all — not even reasonable endeavours. This clearly classifies unimportant systems and sends a message to business owners that they cannot expect any support at all. It’s likely, but not always true, that these systems are waiting to be decommissioned.

These are just very broad, example tiers. You should think about what is appropriate for your business and vary them accordingly. However, the strongest recommendation that I can make is that you start with a very small number of tiers and only add more when there is a real need. It might feel more complete to have a Silver and Platinum tier, but there’s no point adding them unless the response to an issue with a service in one of these tiers has a meaningfully different process.

Working with your Service Catalogue

Once you have your draft of the service catalogue, with a number of systems with business and technical owners and service tiers defined, it’s time to review your catalogue and share it with relevant people in your organisation.

You should share often and early for review, particularly with tech leaders, business owners and members of the exec team.

During discussions, it’s likely that there will be raised eyebrows about the service tiers, leading to questions about why there aren’t more systems in the Gold tier. There’s often a latent assumption in the business that techies will respond at any time, and enjoy working without getting paid overtime. The service catalogue is your vehicle to drive a conversation about the business case for paying your team fairly for working during anti-social hours.

Once you have addressed the strongest and most vocal challenges, and have a document that represents most of your important services, the service catalogue becomes a living document that needs constant revision and attention.

Expanding your Service Catalogue

We’ve discussed the most simple and quick to create service catalogue. The idea is that you now have the foundation of a management system which tracks your important services. As you scale, you may find that there are increasingly impressive (and expensive) tools to manage elements of the service catalogue, which may justify their complexity and expense in the future.

But, whatever system you choose, remember that the core of the service catalogue is a desire to maintain an awareness of the services used by the business and to ensure that there are clear owners who know that they will be held to account.

In time, it’s possible that you will add more capabilities to your service catalogue. Here’s some common evolutions:

Expanding service tiers and on-call support

After the initial service tiers have been created, it’s likely that you will want to start work on more clearly defining your tiers, and also reflecting on which teams are responsible for operational maintenance. You may choose to build out more fine grained service tiers, but you can also start to use the catalogue with a tool like PagerDuty to describe business services and technical services. You can then link these services with observability and monitoring tools, which can also link directly to your on-call rota.

Handling Procurement basics

If your organisation doesn’t have a procurement system in place, it’s a simple matter to add some procurement basics to the sheet we’ve created. Simply add columns to reflect data such as:

  • Renewal term (eg monthly/annual)
  • Renewal date
  • Link to the contract
  • Expected or actual cost

You can then use this data to set calendar events, inform the business owner of renewals and even run quick reports or forecasts on tech spends.

Linking the Service Catalogue to a Developer Portal

The service catalogue in this form is just a spreadsheet, and only really contains larger systems. A new class of tools, known as ‘Developer Portals’ like Spotify Backstage and OpsLevel now try to manage the microservices, serverless and SoA applications which are created by your development team.

Use the Service Catalogue as a starting point for larger aggregations of customer facing services in your applications — think things like ‘Checkout process’, ‘Registration’ or ‘Search’ — and track the services that make up those complex systems in a developer portal.

Simple and working first

This was a quick intro to the basics of a service catalogue, and it’s possible that even this simple implementation will see you through the first couple of years of growth. This catalogue is useful to demonstrate just how many different systems make up your overall tech ecosystem, to answer and deflect questions about risk or support that come from due diligence or security audits, to ensure that no systems fall into disrepair without owners, and to ensure that your organisation really lives up to the philosophy of ‘if you build it, you run it’.

You just need a simple spreadsheet to get started. What’s stopping you?

--

--

Mark Ridley

Technologist, lean evangelist, chaos monkey and Chief Technology Prevention Officer. Loves good coffee, hanging around on ropes and driving about in cars