Description & Requirements
Bloomberg’s data-driven products depend on fast, relevant, and secure access to petabytes of structured and unstructured data. The BBDS (Bloomberg Big Data Services) platform powers this scale with distributed systems built on Apache Kafka, MySql, Vitess, Apache Solr, and other cutting-edge technologies. We use clusters that index and serve millions of documents daily, making financial data easily discoverable across the firm.
Our Team
The DataHub Engineering team provides a distributed platform for hosting datasets, complete with managed data stores, search, discovery, lakehouse, and real-time stream processing capabilities. The platform offers a single place within Bloomberg to discover, access, publish, and subscribe to data.
You’ll join the team that introduced the abstraction of a “dataset”, invented a schema language to formally define all data at Bloomberg—complete with schema evolution, versioning, and true point-in-time semantics.
We’re the team that first brought Kafka, Avro, Dataset Schema Registry, Mesos, Clustered MySQL, Vitess, and Spark into the ecosystem to power a new data-intensive platform that is the hub for financial datasets.
The DataHub’s Search and Discovery Infrastructure, built on Apache Solr, powers the discoverability of those datasets, making Bloomberg’s financial data easy to search, index, and explore. Our systems serve millions of queries daily across hundreds of datasets, driving everything from analytics to real-time data products.
We'll trust you to:
Build tools and automation in Java or Python for indexing, reindexing, and performance tuning
Design and enhance indexing and query pipelines for performance, scalability, and reliability
Debug complex issues involving query latency, indexing pipelines, and distributed systems behavior
Collaborate with engineers across BBDS to enhance data discoverability, security, and scalability
Contribute upstream to open-source search technologies and improve internal frameworks for observability and resilience
Drive initiatives around Vector Indices and Hybrid Search capabilities
Apply performance engineering techniques using tools like eBPF to profile and optimize low-latency systems
You'll need to have:
4+ years of software development experience using Java
Deep systems knowledge of JVM internals, Java, Linux, Networking, and Distributed systems
Familiarity with low-latency systems and performance tuning using eBPF or similar tools
A degree in Computer Science, Engineering, Mathematics, or equivalent practical experience
We'd love to see:
Experience with Python and/or Go
A passion for scalable, resilient, secure, and observable distributed systems
Expertise in Lucene, Apache Solr or Elasticsearch (indexing, sharding, scaling, query tuning)
We offer one of the most comprehensive and generous benefits plans available and offer a range of total rewards that may include merit increases, incentive compensation (exempt roles only), paid holidays, paid time off, medical, dental, vision, short and long term disability benefits, 401(k) +match, life insurance, and various wellness programs, among others. The Company does not provide benefits directly to contingent workers/contractors and interns.