Case study

Observability for a Fund Manager

A UK based private fund manager with ~40 people in IT and 300+ people managing a fund of ~£22bn had developed a core platform to support their business. This platform had monitoring in place but the client was getting recurring incidents and their time to resolve (MTTR) was high.

Image

The situation

Catapult's consultants reviewed the existing monitoring and support processes and identified two main problems.

  1. The monitoring solution (Elastic / ELK stack) was an old version of the products. It was collecting only a limited amount of data and had no dashboards. It had no infrastructure or middleware data and it was very difficult to identify the causes of any errors.
  2. The support team did not have enough data to identify and diagnose the root causes of any problems. This lead to the same issues reoccurring, causing frustration between the development and support teams.

The solution

Catapult recommended moving to an observability model that involved getting logs, metrics and traces into the Elastic stack. Our engineers worked to upgrade the tools and added collectors to get infrastructure, database and queue log data into the monitoring tool. We worked with the development and support teams to identify key metrics and Service Level Agreements (SLAs) and then set up rules, and alerts for when these are breached. Dashboards were created to visualise service health and performance. 

We also set up a runbook framework for the support team to use for when they got an alert or an incident occurred. These provided steps to fix problems, along with processes to prevent recurrence.

The results

There was an immediate impact once the additional monitoring was in place:

  • Fewer incidents due to proactive alerting
  • Faster Root Cause Analysis (RCA) of problems through additional data 
  • Improved MTTR turnaround due to runbooks 
  • Higher prevention of recurrence due to better RCA and collaboration between development and support team

Case Studies

    Automation drives reduction in software testing errors

    Catapult was engaged to help a multinational telecoms firm transform a large-scale telco stack (30-50 systems, 1500 developers, and 300 manual testers) from a traditional waterfall development model – where software testing is done at the end of the process – to a continuous delivery model. ...

    Read story
    Image

    Digitalising the UKSR

    As part of the Maritime Coastguard Agency's goal to become the best performing and fastest growing international flag, they decided to digitalise the UK Ship Register to reduce internal administration and enhance the Customer Experience (CX). Catapult took an agile approach supported by Atlassian to ...

    Read story
    Image

    Digitalising the Beacon Register - Alpha

    The UK Maritime and Coastguard (MCA) enhanced emergency beacon registration and management. Automation, mobile-friendly service, and data validation improved user experience, cut costs by £300,000 annually, and enabled efficient search and rescue operations. ...

    Read story
    Image

    Atlassian Consolidation for Zoopla

    Zoopla, a major UK property website with 40M monthly visitors, excelled and expanded by acquiring companies like PSG, Alto, Jupix, MoveIt, all using Atlassian tools. To streamline collaboration, cut costs, and maintain efficiency, they sought to unify multiple Jira and Confluence instances. ...

    Read story
    Image

    Bringing DevOps and ITIL together at Harvey Nichols

    Harvey Nichols, a UK luxury department store chain, maintains exceptional customer experiences in-store and online. Focusing on aligning DevOps and ITIL, they aim to swiftly address customer inquiries and issues by harmonizing their development and operations teams, ensuring high IT standards for a ...

    Read story
    Image