Description & Requirements
As a Network Reliability Engineer, you will work within a team of software engineers that are responsible for the tooling, automation & stability of our Global Network Infrastructure that supports Bloomberg products and services. Our network connects many large-scale Data Centers and over a hundred edge sites that connect Bloomberg to our hundreds of thousands of clients around the world and over 1,500 global exchanges and trading venues over private connectivity, Internet and Public Cloud.
You will be part of a team that builds automations to support the complete lifecycle of the network and network security infrastructure including provisioning, configuration management, observability & capacity planning. You will also work closely with other Network Engineers & CTO Office for the planning & implementation of new technology and architecture to ensure we have the proper scalability, efficiency & tooling for our Operations teams.
Our Mission:
Ensure the stability of the Network Infrastructure while working on the next generation of infrastructure as a service
Provide tooling & visibility for our operations teams to manage the Network
Improve observability into Network Performance to spot trends and anomalous behavior
Work with heterogeneous systems including but not limited to routers, switches, firewalls and load balancers to ensure high Network availability for application connectivity and infrastructure as a service
Build automated deployment pipelines, monitoring and failover
Continuously re-evaluate, automate and modernize tooling and infrastructure to meet the latest industry and company-wide standards
Support custom in-house tools, open source and commercial tools
We'll trust you to:
Help apply SRE best practices to our solutions
Engineer solutions to monitor the health, availability, and capacity of our Network Infrastructure
Use automation to bring scalability and efficiency to our systems
Maintain the monitoring of our systems and provide solutions that can react to those alarms to minimize client impact and manual intervention
Provide tooling for new Network architectures, technologies and visibility.
Define service level objectives and appropriate metrics to measure our performance against those objectives
Automate the provisioning, configuration and management of infrastructure and applications with modern orchestration tools
Troubleshoot applications, networks, and operating systems
Write software in languages such as Python to automate tasks and interact with APIs
You’ll need to have:
Experience as an SRE, Network, DevOps, or Software engineer
Experience with building, maintaining and continuously enhancing automations needed for scalability & efficiency in running the Network Infrastructure
Orchestration, Automation Frameworks & Infrastructure as Code technologies: Ansible, Terraform, Chef, Salt, etc.
Experience with object-oriented programming languages preferably in Python
Engineer solutions to monitor the health, availability and capacity of our network
A degree in Computer Science, Engineering or similar field of study or equivalent work experience
We'd love to see:
Experience with continuous integration and deployment tools
Strong understanding of various Network architectures across Internet, Public Cloud, Private Networks, etc.
Experience managing and automating network and network security devices such as Juniper, Nokia, Arista, Cisco, Palo Alto, F5, Symantec web gateways, etc.
Experience with Telemetry: Splunk, Grafana, Humio
Eagerness to learn new technologies and mentor others