Description & Requirements
At Bloomberg, our software is a major contributor to financial markets. To do this, we ingest real-time market data, such as trades, quotes, orders and news, from hundreds of exchanges and thousands of newswires around the world – to the tune of 400 billion ticks of data per day. Our users and downstream applications rely on us to always be available - this is where our SREs come in. Our Reliability Engineers help ensure our market data is distributed quickly, reliably and at the scale necessary to handle the ever-growing demand for data.
Our Team:
Our team is part of the Realtime Market Data Organization and is new in the region, giving you the opportunity to shape the vision as it grows. As a Reliability Engineer in our team, your mission will be to drive the Incident Handling operations, covering everything from recovery, to monitoring, to resilient designs, testing, and quality checks. Our systems connect to lossy external vendors and telecommunications systems, which will bring exciting challenges to put these reliability concepts into practice. As our group will lead the reliability initiatives in our organization you will work alongside many groups in different time zones to influence this vision, the best practices and policies.
We will trust you to:
- Create and maintain monitoring solutions to be used in production monitoring, capacity management, incident detection and response
- Help us establish SLOs and SLIs we can use to measure our quality as an organization, and contribute to engineering projects aimed at ensuring we meet those standards
- Develop policies and runbooks for investigating, triaging, and troubleshooting production problems both for yourself, our team, and the organization
- Develop and maintain tools used in investigating production problems
- Build automation for manual processes to increase reliability while at the same time reducing time to market and cost
- Help improve development and operational standards within the Realtime Market Data Organization; you will work with your business partners and software engineers
- Ensure that support documentation is produced, maintained, and improved
You’ll need to have:
- Demonstrated experience programming with Python
- Experience programming in an object-oriented language such as C++, Java, C#, Rust, Go
- A Degree in Computer Science, Engineering, Mathematics, similar field of study or equivalent work experience
- An understanding and appreciation of SRE concepts: observability, production monitoring, capacity management, automated deployment, orchestration, configuration management, etc.
- Experience with Linux environment, operating systems, tooling, and scripting
- Fluency in both written and spoken English
- Proven ability to communicate and collaborate closely with internal stakeholders such as Engineering and Operations teams
We would love to see:
- Experience supporting, monitoring and debugging large scale distributed systems
- Demonstrated interest in working with both hardware and software infrastructure
- Knowledge of Financial Market Data
- Experience with networking such as TCP/UDP/IP
- Experience with web applications (big plus for React, FastAPI, Postgres)
- Experience in working in close collaboration with multiple teams in multiple locations and time zones
If this sounds like you, apply and we will get in touch to let you know what the next steps are.