-
Lead the design, build, and operation of cloud infrastructure, platform services, and ops tooling that underpin our live service and empower engineering teams.
Own and evolve monitoring and alerting across production systems - ensuring dashboards and alerts are high-signal and actionable.
Lead incident response during production issues. Coordinate across teams, drive resolution, and ensure thorough post-incident reviews with meaningful follow-up actions.
Investigate and resolve complex technical issues across application, infrastructure, data, and networking layers - performing root cause analysis and implementing durable fixes, not just patches.
Maintain, evolve, and champion our Infrastructure-as-Code (Terraform) to ensure environments are reproducible, auditable, and version-controlled.
Improve CI/CD pipelines and deployment practices to increase engineering velocity while maintaining deployment safety and rollback capability.
Proactively manage cloud capacity and cost — monitoring spend, identifying optimisation opportunities, and contributing to FinOps practices including tagging, budgeting, and cost allocation.
Ensure infrastructure and platform services meet security, regulatory, and compliance requirements. Implement and maintain controls around network segmentation, access management, secrets handling, and vulnerability patching.
Support audit and governance processes with evidence and documentation.
Drive automation of operational toil and repetitive tasks — if you're doing it more than twice, automate it.
Contribute to production-readiness standards — every change should be observable, reversible, and well-tested before it reaches customers.
Foster a knowledge-sharing environment with thorough documentation, runbooks, and a teamwork-oriented culture.
Support the wider business to meet their goals where major infrastructure or service change is required.
Support, coach, and mentor other team members. Raise the technical bar through pairing, code review, and leading by example.
Stay abreast of and (where necessary) apply the latest emerging technologies relevant to systems engineering and cloud infrastructure.