Artem Krylysov


Posts tagged with dev

Timeseries Indexing at Scale

Datadog collects billions of events from millions of hosts every minute and that number keeps growing and fast. Our data volumes grew 30x between 2017 and 2022. On top of that, the kind of queries we receive from our users has changed significantly. Why? Because our customers have grown in sophistication: they run more complex stacks, want to monitor more data, and run more complex analyses. That, in turn, puts pressure on our timeseries data store.

Data stores have a number of tricks in their bag to offer good performance. One of the most critical ones is the judicious use of indices, a key data structure that can make queries fast and efficient, or unbearably slow. Over the years, our homegrown indices that were put in place in 2016 became a performance bottleneck for queries and a source of increased maintenance. We knew that we had to learn from these challenges and come up with something better.

This blog post provides an overview of the Datadog timeseries database and the challenges of timeseries indexing at scale. We’ll compare the performance and reliability of two generations of indexing services.


How RocksDB works

Introduction #

Over the past years, the adoption of RocksDB increased dramatically. It became a standard for embeddable key-value stores.

Today RocksDB runs in production at Meta, Microsoft, Netflix, Uber. At Meta RocksDB serves as a storage engine for the MySQL deployment powering the distributed graph database.

Big tech companies are not the only RocksDB users. Several startups were built around RocksDB - CockroachDB, Yugabyte, PingCAP, Rockset.

I spent the past 4 years at Datadog building and running services on top of RocksDB in production. In this post, I'll give a high-level overview of how RocksDB works.


Let's build a Full-Text Search engine

Full-Text Search is one of those tools people use every day without realizing it. If you ever googled "golang coverage report" or tried to find "indoor wireless camera" on an e-commerce website, you used some kind of full-text search.

Full-Text Search (FTS) is a technique for searching text in a collection of documents. A document can refer to a web page, a newspaper article, an email message, or any structured text.

Today we are going to build our own FTS engine. By the end of this post, we'll be able to search across millions of documents in less than a millisecond. We'll start with simple search queries like "give me all documents that contain the word cat" and we'll extend the engine to support more sophisticated boolean queries.


String interning in Go

String interning is a technique of storing only one copy of each unique string in memory. It can significantly reduce memory usage for applications that store many duplicated strings.


Handling C++ exceptions in Go

Cgo is a mechanism that allows Go packages call C code. The Go compiler enables cgo for every .go source file that imports a special pseudo package "C". The text in the comment before the import "C" line is treated as a C code. You can include headers, define functions, types and variables - everything a normal C code can do:

package main

/*
#include <stdio.h>

void foo(int x) {
    printf("x: %d\n", x);
}
 */
import "C"

func main() {
    C.foo(C.int(123)) // x: 123
}

Profiling and optimizing Go web applications

Go has a powerful built-in profiler that supports CPU, memory, goroutine and block (contention) profiling.

Enabling the profiler #

Go provides a low-level profiling API runtime/pprof, but if you are developing a long-running service, it's more convenient to work with a high-level net/http/pprof package.

All you need to enable the profiler is to import net/http/pprof and it will automatically register the required HTTP handlers:

package main

import (
    "net/http"
    _ "net/http/pprof"
)

func hiHandler(w http.ResponseWriter, r *http.Request) {
    w.Write([]byte("hi"))
}

func main() {
    http.HandleFunc("/", hiHandler)
    http.ListenAndServe(":8080", nil)
}

Benchmark of Python JSON libraries

A couple of weeks ago after spending some time with Python profiler, I discovered that Python’s json module is not as fast as I expected. I decided to benchmark alternative JSON libraries.


Производительность С++ STL regex

Столкнулся недавно с простой задачей - нужно было найти позицию открывающегося тега <body> в HTML странице. Не долго думая я решил использовать регулярные выражения, через минуту у меня родился регексп <body[^>]*>. Все работало хорошо, пока дело не дошло до тестирования на больших объемах данных.


Автоматизация процесса разработки браузерных расширений

Всем, кто хотя бы раз сталкивался с разработкой браузерных расширений известно, что это настоящий геморрой.

Проблему разработки и сборки расширений под все популярные браузеры в большинстве случаев можно решить с помощью, например, Kango framework (кто не знает, Kango позволяет собирать расширения под Chrome, Firefox, Safari и Internet Explorer используя общий JavaScript код).

Информации о том, как лучшим образом настроить среду разработки браузерных расширений очень мало, поэтому хочу поделиться своим опытом.