Senior Software Engineer / SRE - Electronic Trading

Location

London

Business Area

Engineering and CTO

Ref #

10050148

Description & Requirements

About Observability Engineering

Senior Software Engineers - SRE in Electronic Trading (ET) ensure our global enterprise products spanning fixed income, equities, and derivatives are resilient and observable. This role focuses on building culture and platforms of observability and resilience to prevent market disruptions for global traders.

We specialize in proactive anomaly detection, providing advanced performance insights and best practice guidance. Our team collaborates with application developers to define meaningful SLOs, implement chaos engineering, and build diagnostic tools that mitigate architectural risks as our platforms scale.

What’s in it for you?

You will have the autonomy to drive reliability initiatives end-to-end, influencing the reliability strategy for critical global trading systems. By championing modern SRE practices and automation, you will fundamentally transform how we manage system stability.

In your day-to-day, you’ll develop frameworks for tracking reliability metrics, collaborate on system health reports, and build libraries that standardize alerting and incident response. You will also use failure injection and chaos testing to validate system performance under real-world stress. Our teams primarily build software using Python

We’ll trust you to:

Define and promote standards for observability, alerting, and incident response.
Develop self-maintaining tools using statistical analysis, health metrics, and distributed tracing.
Embed resiliency best practices into the full software development lifecycle.
Lead initiatives to mitigate risks related to performance, capacity, and scale.
Translate technical findings into actionable insights for engineers and stakeholders.
Automate operational tasks to enhance the safety and scalability of our infrastructure.

You’ll need to have:

Professional experience with Python or C++.
Strong collaboration and communication skills.
An understanding of distributed systems and system reliability.
Familiarity with SLOs, SLIs, and SLAs.
A degree in Computer Science, Engineering, or equivalent practical experience.

We’d love to see:

Experience in an SRE, Reliability or Production Engineering role.
Deep knowledge of system health assessment and building effective alerting.
Hands-on experience with monitoring tools (e.g., Grafana, Humio) and chaos engineering.
Familiarity with leveraging Generative AI (e.g., GitHub Copilot, Gemini) to accelerate development.
Experience with big data technologies like Apache Spark or Amazon S3.

If indicated, please note that years of experience are a guide; we will consider applications from all candidates who can demonstrate the skills necessary for the role.

Discover what makes Bloomberg unique - watch our podcast series for an inside look at our culture, values, and the people behind our success.