Job Title: Operations Manager (Service Management & Site Reliability)
Location: Office-Based with Hybrid Optional
Department: Engineering / Platform & Service Operations
Reports to: Head of Engineering / Platform Services
Salary: Competitive (Market-aligned for Manchester)
Must have full right to work in UK
Located within 45 minutes of Knutsford
Company Overview
We build next-generation solutions for capturing and archiving regulated data from voice, messaging (e.g., WhatsApp, Teams), and operational technology systems. Our platform supports mission-critical compliance workloads for customers in financial services and other regulated industries, where availability, integrity, and auditability are essential.
Role Summary
We are seeking an experienced Operations Manager to act as the operational owner and Service Manager for a business-critical platform, while also managing the Customer Service (Level 1) function.
This role combines ITIL-based service management discipline, Site Reliability Engineering (SRE) principles, and people leadership to ensure high service availability, effective incident response, and continuous improvement across both customer-facing support and backend service operations.
You will have end-to-end accountability for live service operations, leading both the Service Engineering team and the Customer Service (L1) team, and owning service performance, platform reliability, operational risk, and financial stewardship.
Hands-on experience managing cloud platforms in critical, always-on environments is a mandatory requirement for this role.
Key Responsibilities
Service & Operational Ownership
- Act as the named Service Manager for the platform, with full accountability for service performance, stability, and customer impact.
- Own the service lifecycle, from operational readiness and go-live through live service management and continual improvement.
- Define, own, and report against SLAs, SLOs, and operational KPIs across both customer service and service engineering functions.
- Serve as the primary operational escalation point for internal stakeholders and key customers.
Customer Service (L1) Management
- Lead and manage the Customer Service (Level 1) team, ensuring consistent, high-quality first-line support for customers.
- Ensure effective triage, prioritisation, and escalation of incidents from L1 to Service Engineering.
- Drive customer-focused service metrics, including response times, resolution quality, and customer satisfaction.
- Establish training, coaching, and quality assurance processes to continually improve L1 service delivery.
Reliability, Availability & Incident Management
- Own the end-to-end reliability and availability of a mission-critical, compliance-focused platform.
- Apply SRE principles to reduce incidents, manage operational risk, and balance reliability with delivery velocity.
- Lead major incident management, ensuring effective coordination, clear communication, and rapid service restoration.
ITIL-Aligned Service Operations
- Lead Incident, Problem, Change, and Release Management in line with ITIL best practices.
- Plan and execute on-premises software upgrades and platform changes, ensuring controlled delivery and minimal disruption.
- Drive thorough root cause analysis (RCA) and ensure corrective actions are implemented and tracked to completion.
- Maintain audit-ready service documentation, runbooks, and operational procedures.
Cloud Platform & Engineering Collaboration
- Own operational oversight of cloud and hybrid platforms supporting critical customer services.
- Work closely with Engineering, Product, and Security teams to ensure platforms are operationally ready, resilient, observable, and secure.
- Ensure appropriate monitoring, alerting, capacity planning, and resilience controls are in place across Azure and hybrid environments.
- Champion automation and Infrastructure-as-Code to reduce operational toil and improve reliability.
Budget & Cost Management
- Own and manage the operations and service engineering budget, ensuring spend is forecast, controlled, and aligned to service outcomes.
- Manage costs related to cloud infrastructure, on-premises upgrades, tooling, licensing, and third-party services.
- Partner with Finance and Procurement to justify investment and identify cost-optimisation opportunities without compromising service reliability or compliance.
Leadership & Team Development
- Lead, coach, and develop both the Customer Service (L1) and Service Engineering teams.
- Establish structured onboarding, training, and progression paths to build resilient, high-performing teams.
- Foster a culture of accountability, service excellence, and continuous improvement across operations.
What You Bring
Essential (Must-Have):
- Proven experience managing cloud platforms in critical, always-on production environments.
- Demonstrable experience owning or operating services where uptime, data integrity, and regulatory compliance are critical.
Required Experience & Skills:
- Azure Certification.
- Proven experience as an Operations Manager, Service Manager, or equivalent, with ownership of live services.
- Strong hands-on experience with ITIL service management practices, particularly Incident, Problem, Change, and Continual Improvement.
- Experience managing Customer Service / L1 support teams in a production environment.
- Working knowledge of Site Reliability Engineering (SRE) principles and operational risk management.
- Strong technical foundation across Azure, Windows Server, Linux (RedHat), Active Directory, networking, and scripting (PowerShell, Bash, or Python).
- Experience delivering platform upgrades and managing production change in cloud and hybrid environments.
- Experience owning operational budgets and cost centres.
- Calm, structured leadership style with a strong focus on uptime, customer impact, deadlines, and service quality.
- A genuine commitment to training, mentoring, and building high-performing operational teams.
Job Types: Full-time, Permanent
Pay: £55,000.00-£60,000.00 per year
Benefits:
- Company pension
- Free parking
- Gym membership
- Health & wellbeing programme
- On-site gym
- On-site parking
- Private dental insurance
- Private medical insurance
- Sick pay
Ability to commute/relocate:
- Knutsford WA16: reliably commute or plan to relocate before starting work (required)
Application question(s):
- Proven experience managing cloud platforms in critical, always-on production environments.
Demonstrable experience owning or operating services where uptime, data integrity, and regulatory compliance are critical.
Work authorisation:
- United Kingdom (required)
Work Location: In person