Artem Krylysov


Posts tagged with rocksdb

Timeseries Indexing at Scale

Datadog collects billions of events from millions of hosts every minute and that number keeps growing and fast. Our data volumes grew 30x between 2017 and 2022. On top of that, the kind of queries we receive from our users has changed significantly. Why? Because our customers have grown in sophistication: they run more complex stacks, want to monitor more data, and run more complex analyses. That, in turn, puts pressure on our timeseries data store.

Data stores have a number of tricks in their bag to offer good performance. One of the most critical ones is the judicious use of indices, a key data structure that can make queries fast and efficient, or unbearably slow. Over the years, our homegrown indices that were put in place in 2016 became a performance bottleneck for queries and a source of increased maintenance. We knew that we had to learn from these challenges and come up with something better.

This blog post provides an overview of the Datadog timeseries database and the challenges of timeseries indexing at scale. We’ll compare the performance and reliability of two generations of indexing services.


How RocksDB works

Introduction #

Over the past years, the adoption of RocksDB increased dramatically. It became a standard for embeddable key-value stores.

Today RocksDB runs in production at Meta, Microsoft, Netflix, Uber. At Meta RocksDB serves as a storage engine for the MySQL deployment powering the distributed graph database.

Big tech companies are not the only RocksDB users. Several startups were built around RocksDB - CockroachDB, Yugabyte, PingCAP, Rockset.

I spent the past 4 years at Datadog building and running services on top of RocksDB in production. In this post, I'll give a high-level overview of how RocksDB works.