<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Artem Krylysov</title>
    <link href="https://artem.krylysov.com/atom.xml" rel="self"/>
    <link href="https://artem.krylysov.com/"/>
    <updated>2026-02-01T11:04:28.603314Z</updated>
    <id>https://artem.krylysov.com/</id>
    <author>
        <name>Artem Krylysov</name>
    </author>

    
    <entry>
        <title><![CDATA[Timeseries Indexing at Scale]]></title>
        <link href="https://artem.krylysov.com/blog/2024/06/28/timeseries-indexing-at-scale/"/>
        <updated>2024-06-28T00:00:00Z</updated>
        <id>https://artem.krylysov.com/blog/2024/06/28/timeseries-indexing-at-scale/</id>
        <content type="html">
            <![CDATA[<aside class="admonition note">
<p class="admonition-title">Note</p>
<p>This blog post was co-authored with May Lee and is cross-posted on the <a class="reference external" href="https://www.datadoghq.com/blog/engineering/timeseries-indexing-at-scale/" target="_blank">Datadog blog</a>.</p>
</aside>
<p>Datadog collects billions of events from millions of hosts every minute and that number keeps growing and fast. Our data volumes grew 30x between 2017 and 2022. On top of that, the kind of queries we receive from our users has changed significantly. Why? Because our customers have grown in sophistication: they run more complex stacks, want to monitor more data, and run more complex analyses. That, in turn, puts pressure on our timeseries data store.</p>
<p>Data stores have a number of tricks in their bag to offer good performance. One of the most critical ones is the judicious use of indices, a key data structure that can make queries fast and efficient, or unbearably slow. Over the years, our homegrown indices that were put in place in 2016 became a performance bottleneck for queries and a source of increased maintenance. We knew that we had to learn from these challenges and come up with something better.</p>
<p>This blog post provides an overview of the Datadog timeseries database and the challenges of timeseries indexing at scale. We’ll compare the performance and reliability of two generations of indexing services.</p>
<section id="metrics-platform-overview">
<h3>Metrics platform overview<a class="headerlink" href="#metrics-platform-overview" title="Permalink to this headline"> #</a></h3>
<p>From a high level, the metrics platform consists of three major components: intake, storage, and query. As shown in the image below, the Datadog Agent receives data and sends it to the load balancer. The data then gets ingested by metrics intake and written to the message broker. The metrics storage component then reads the data from the message broker and stores it on a disk in the timeseries database. When a Datadog user sends a query (see the example below), the query is sent to the API gateway and gets executed by the metrics query engine.</p>
<figure>
<img alt="" src="https://artem.krylysov.com/images/2024-timeseries-indexing-at-scale/architecture.png" style="width: 323px;" />
<figcaption>
<p>A high-level overview of the metrics platform architecture</p>
</figcaption>
</figure>
<section id="intake">
<h4>Intake<a class="headerlink" href="#intake" title="Permalink to this headline"> #</a></h4>
<p>Metrics intake is responsible for ingesting a stream of data points that consist of a metric name, zero or more tags, a timestamp, and a value. Tags are a way of adding dimensions to Datadog metrics so they can be filtered, aggregated, and compared. A tag can be in the format of <span class="docutils literal">value</span> or <span class="docutils literal">key:value</span>. Commonly used tag keys are <span class="docutils literal">env</span>, <span class="docutils literal">host</span>, and <span class="docutils literal">service</span>. Additionally, Datadog allows users to submit custom tags. The timestamp records the time when the data point was generated. Finally, the value is a numerical value that can track anything about the monitored environment, from CPU usage and error rates to the number of user sign-ups.</p>
<p>This is what a typical ingested data point looks like:</p>
<figure>
<img alt="" src="https://artem.krylysov.com/images/2024-timeseries-indexing-at-scale/point.png" style="width: 611px;" />
<figcaption>
<p>A metric named containerd.cpu.total, with the tags eng:prod, service:event-consumer, and host:I-ABC, a timestamp, and a value</p>
</figcaption>
</figure>
<p>The ingested points are processed and written into <a class="reference external" href="https://www.youtube.com/watch?v=J7RRJ1iBeAg" target="_blank">Kafka</a>, a durable message broker. Kafka allows multiple services and teams to consume the same data stream independently and for different purposes, such as analysis, indexing, and archiving for long-term storage.</p>
</section>
<section id="storage">
<h4>Storage<a class="headerlink" href="#storage" title="Permalink to this headline"> #</a></h4>
<p>One of these Kafka consumers is the short-term metrics storage layer, which is split into two individually deployed services. The first one is the Timeseries Database service, which stores the timeseries identifiers, timestamps, and values as tuples of <span class="docutils literal">&lt;timeseries_id, timestamp, float64&gt;</span>. The second service is responsible for indexing the identifiers and tags associated with them and stores them as tuples of <span class="docutils literal">&lt;timeseries_id, tags&gt;</span>. This is the Timeseries Index service, which is a custom database built on top of RocksDB and is responsible for filtering and grouping timeseries points during query execution.</p>
<figure>
<img alt="" src="https://artem.krylysov.com/images/2024-timeseries-indexing-at-scale/metrics-storage.png" style="width: 381px;" />
<figcaption>
<p>The metrics storage layer consists of the Timeseries Index and Timeseries Database (DB)</p>
</figcaption>
</figure>
</section>
<section id="query">
<h4>Query<a class="headerlink" href="#query" title="Permalink to this headline"> #</a></h4>
<p>The distributed query layer connects to the individual timeseries index nodes, fetches intermediate query results from the timeseries database, and combines them.</p>
<figure>
<img alt="" src="https://artem.krylysov.com/images/2024-timeseries-indexing-at-scale/metrics-storage-query.png" style="width: 501px;" />
<figcaption>
<p>The metrics query service communicating with the Timeseries Index and Timeseries DB</p>
</figcaption>
</figure>
<p>This is what a typical query looks like:</p>
<figure>
<img alt="" src="https://artem.krylysov.com/images/2024-timeseries-indexing-at-scale/query.png" style="width: 592px;" />
<figcaption>
<p>A metric name with tag filters, grouped by host and aggregated using the average function</p>
</figcaption>
</figure>
<p>Filters are used to narrow down a queried metric to a specific subset of data points, based on their tags. They are particularly useful when the same metric is submitted by many hosts and services, but you need to look at a specific one. In this particular example, the <span class="docutils literal">env:prod AND <span class="pre">service:event-consumer</span></span> filter tells the query engine to include only data points that come from the service called <span class="docutils literal"><span class="pre">event-consumer</span></span> that is running in the production environment.</p>
<p>The groups, which are also based on tags, drive the query results. The grouping process produces a single timeseries, or a single line in a line graph, for each unique group. For example, if you have hundreds of hosts spread across four services, grouping by <span class="docutils literal">service</span> allows you to graph one line for each service.</p>
<p>The data points within each group are then combined according to the aggregator function. In the following example, the <span class="docutils literal">avg</span> aggregator computes an average value across all hosts for a specific service, grouped by environment:</p>
<figure>
<img alt="" src="https://artem.krylysov.com/images/2024-timeseries-indexing-at-scale/metrics-graph.png" style="width: 715px;" />
<figcaption>
<p>A metrics query result visualized in a Datadog graph</p>
</figcaption>
</figure>
</section>
</section>
<section id="original-indexing-service">
<h3>Original indexing service<a class="headerlink" href="#original-indexing-service" title="Permalink to this headline"> #</a></h3>
<p>Now that you know where the timeseries database fits into the architecture, let’s get back to indexing. We index timeseries points by their tags to make query execution more efficient and avoid scanning the data for the entire metric when only a small subset is requested. Scanning the data for the entire metric would be similar to a full table scan in SQL databases.</p>
<p>Datadog’s original indexing strategy relied heavily on automatically generating indexes based on the query log of a live system. Every time the system encountered a slow or resource-consuming query, it would record the information about the received query in a log that was periodically analyzed by a background process. The process looked at the number of queries, the query execution time, the number of input (scanned) timeseries identifiers, and the number of output (returned) identifiers. Based on these parameters, the process would then find and create indexes for highly selective queries, meaning queries with high input-to-output ratios.</p>
<p>Additionally, the indexes that became obsolete and stopped receiving queries would get removed from the system. Automatically generated indexes were highly effective in reducing the amount of CPU and memory resources spent on repetitive queries. The indexes would create materialized views of resource-consuming queries, turning a slow full table scan into a single key-value lookup.</p>
<section id="design">
<h4>Design<a class="headerlink" href="#design" title="Permalink to this headline"> #</a></h4>
<p>The original timeseries index service was implemented in Go with the help of embedded data stores: <a class="reference external" href="https://www.sqlite.org/" target="_blank">SQLite</a> and <a class="reference external" href="https://rocksdb.org/" target="_blank">RocksDB</a>. Embedded means these data stores are not running on a separate server or a standalone process, and instead are integrated directly into an application. We used SQLite, the <a class="reference external" href="https://www.sqlite.org/mostdeployed.html" target="_blank">most widely</a> deployed SQL engine, to store metadata such as the index definitions and the query log:</p>
<pre class="code sql literal-block"><code><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">index_definitions</span><span class="w"> </span><span class="p">(</span><span class="w">
    </span><span class="n">metric_name</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">,</span><span class="w">
    </span><span class="n">filter_tags</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">,</span><span class="w">
    </span><span class="n">query_count</span><span class="w"> </span><span class="nb">INTEGER</span><span class="p">,</span><span class="w">    </span><span class="c1">-- number of times the index was queried
</span><span class="w">    </span><span class="k">timestamp</span><span class="w"> </span><span class="nb">INTEGER</span><span class="p">,</span><span class="w">
    </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="w"> </span><span class="p">(</span><span class="n">metric</span><span class="p">,</span><span class="w"> </span><span class="n">filters</span><span class="p">)</span><span class="w">
</span><span class="p">);</span><span class="w">

</span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">query_log</span><span class="w"> </span><span class="p">(</span><span class="w">
    </span><span class="n">metric_name</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">,</span><span class="w">
    </span><span class="n">filter_tags</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">,</span><span class="w">
    </span><span class="n">inputs</span><span class="w"> </span><span class="nb">INTEGER</span><span class="p">,</span><span class="w">     </span><span class="c1">-- number of input (scanned) ids
</span><span class="w">    </span><span class="n">outputs</span><span class="w"> </span><span class="nb">INTEGER</span><span class="p">,</span><span class="w">    </span><span class="c1">-- number of output (returned) ids
</span><span class="w">    </span><span class="n">duration_msec</span><span class="w"> </span><span class="nb">INTEGER</span><span class="p">,</span><span class="w">
    </span><span class="k">timestamp</span><span class="w"> </span><span class="nb">INTEGER</span><span class="p">,</span><span class="w">
    </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="w"> </span><span class="p">(</span><span class="n">query_id</span><span class="p">)</span><span class="w">
</span><span class="p">);</span></code></pre>
<p>The index definition table was read-heavy, updated infrequently and cached entirely in memory. The query log was bulk updated in the background outside of the ingest and query paths. The flexibility of SQL was convenient for debugging because we could easily inspect and modify the tables using the sqlite3 CLI.</p>
<p>The heavy-lifting was the handling of all writes, required for indexing trillions of events per day, and done using RocksDB. RocksDB is a key-value store that powers production databases and indexing services at big tech companies such as <a class="reference external" href="https://engineering.fb.com/2021/07/22/data-infrastructure/mysql/" target="_blank">Meta</a>, <a class="reference external" href="https://blogs.bing.com/Engineering-Blog/october-2021/RocksDB-in-Microsoft-Bing" target="_blank">Microsoft</a>, and <a class="reference external" href="https://netflixtechblog.com/application-data-caching-using-ssds-5bf25df851ef" target="_blank">Netflix</a>. The Datadog timeseries index service maintained three RocksDB databases per node: Tagsets, Metrics, and Indexes.</p>
<figure>
<img alt="" src="https://artem.krylysov.com/images/2024-timeseries-indexing-at-scale/original-dbs.png" style="width: 431px;" />
<figcaption>
<p>The data stores used by the Timeseries Index</p>
</figcaption>
</figure>
<p>The Tagsets database stored a mapping of timeseries IDs to tags associated with them. The key was the timeseries ID and the value was the set of tags. If you consider these six data points:</p>
<table>
<thead>
<tr><th class="head"><p>Metric Name</p></th>
<th class="head"><p>Timeseries ID</p></th>
<th class="head"><p>Tags</p></th>
</tr>
</thead>
<tbody>
<tr><td><p>cpu.total</p></td>
<td><p>1</p></td>
<td><p>env:prod,service:web,host:i-187</p></td>
</tr>
<tr><td><p>cpu.total</p></td>
<td><p>2</p></td>
<td><p>env:prod,service:web,host:i-223</p></td>
</tr>
<tr><td><p>cpu.total</p></td>
<td><p>3</p></td>
<td><p>env:staging,service:web,host:i-398</p></td>
</tr>
<tr><td><p>cpu.total</p></td>
<td><p>7</p></td>
<td><p>env:prod,service:db,host:i-409</p></td>
</tr>
<tr><td><p>cpu.total</p></td>
<td><p>8</p></td>
<td><p>env:prod,service:db,host:i-543</p></td>
</tr>
<tr><td><p>cpu.total</p></td>
<td><p>9</p></td>
<td><p>env:staging,service:db,host:i-681</p></td>
</tr>
</tbody>
</table>
<p>This is how the data was stored in the Tagsets database:</p>
<table>
<thead>
<tr><th class="head"><p>Key (timeseries ID)</p></th>
<th class="head"><p>Value (tags)</p></th>
</tr>
</thead>
<tbody>
<tr><td><p>1</p></td>
<td><p>env:prod,service:web,host:i-187</p></td>
</tr>
<tr><td><p>2</p></td>
<td><p>env:prod,service:web,host:i-223</p></td>
</tr>
<tr><td><p>3</p></td>
<td><p>env:staging,service:web,host:i-398</p></td>
</tr>
<tr><td><p>7</p></td>
<td><p>env:prod,service:db,host:i-409</p></td>
</tr>
<tr><td><p>8</p></td>
<td><p>env:prod,service:db,host:i-543</p></td>
</tr>
<tr><td><p>9</p></td>
<td><p>env:staging,service:db,host:i-681</p></td>
</tr>
</tbody>
</table>
<p>And the Metrics database contained a list of timeseries IDs per metric:</p>
<table>
<thead>
<tr><th class="head"><p>Key (metric)</p></th>
<th class="head"><p>Value (timeseries ID)</p></th>
</tr>
</thead>
<tbody>
<tr><td><p>cpu.total</p></td>
<td><p>1,2,3,7,8,9</p></td>
</tr>
</tbody>
</table>
<p>The Tagsets and Metrics databases alone were enough to run queries. Consider the query <span class="docutils literal">cpu.total{service:web AND <span class="pre">host:i-187}</span></span>. To execute it, first you need to get all the timeseries IDs for the metric <span class="docutils literal">cpu.total</span> from the Metrics database. This then translates into a single RocksDB key-value lookup for the key <span class="docutils literal">cpu.total</span>, which returns the values: <span class="docutils literal">1,2,3,7,8,9</span>.</p>
<p>After we get all the timeseries IDs, we query each ID from the Tagsets database to check whether the associated tags matched the filters. The approach was similar to full table scans in SQL databases where we had to look at all possible tag sets for the metric. Its main downside was that the number of Tagsets lookups grew linearly with the number of timeseries per metric, which in some cases was a challenge for high cardinality metrics. In this particular example, we needed to look up seven keys in total; one key from the Metrics database and six keys from the Tagsets database, one key for each timeseries ID.</p>
<p>To avoid full scans, resource-consuming queries were indexed in the Indexes RocksDB database. Every resource-consuming query was logged in the query_log SQLite table. Periodically, a background process queried the table and would create a new index in the index_definitions table for those resource-consuming queries. The ingestion path checked whether the ingested timeseries belonged to an index, and if it did, the ID would be written to the Indexes database in addition to the Tagsets and Metrics databases.</p>
<figure>
<img alt="" src="https://artem.krylysov.com/images/2024-timeseries-indexing-at-scale/original-components.png" style="width: 431px;" />
<figcaption>
<p>How the data stores were used by the index query, ingestion, and generator</p>
</figcaption>
</figure>
<p>This is how the Indexes database would look like if we created two indexes for the metric <span class="docutils literal">cpu.total</span>; one index for the <span class="docutils literal">service:web</span>, <span class="docutils literal"><span class="pre">host:i-187</span></span> filters and another for the <span class="docutils literal">service:db</span> filter.</p>
<table>
<thead>
<tr><th class="head"><p>Key (metric;tags)</p></th>
<th class="head"><p>Value (timeseries IDs)</p></th>
</tr>
</thead>
<tbody>
<tr><td><p>cpu.total;service:web,host:i-187</p></td>
<td><p>1</p></td>
</tr>
<tr><td><p>cpu.total;service:db</p></td>
<td><p>7,8,9</p></td>
</tr>
</tbody>
</table>
<p>Now, if someone queries <span class="docutils literal">cpu.total{service:web AND <span class="pre">host:i-187}</span></span>, the query planner tries to match the metric and the filters against the index definitions. Because there was an index for the exact filters the query was asking for (the tags <span class="docutils literal">service:web</span> and <span class="docutils literal"><span class="pre">host:i-187</span></span>), the query would get its results directly from the Indexes database, without having to access the Tagsets and Metrics databases. The query that previously required scanning Tagsets and making seven RocksDB lookups, now only needed a single lookup.</p>
<section id="advantages">
<h5>Advantages<a class="headerlink" href="#advantages" title="Permalink to this headline"> #</a></h5>
<p>For Datadog Metrics, like most monitoring systems, the ratio of queried-to-written timeseries data points is typically low. On average, we see roughly only 30% of the written data being consistently queried, making this indexing strategy space efficient. We had to cover only a subset of the ingested data with indexes to make most queries perform well.</p>
</section>
<section id="challenges">
<h5>Challenges<a class="headerlink" href="#challenges" title="Permalink to this headline"> #</a></h5>
<p>Automatically generated indexes worked well for programmatic query sources such as periodic jobs or alerting. However, user-facing queries are less predictable, and they often fell back to full table scans, leading to query timeouts and poor user experiences. Additionally, even programmatic query sources occasionally change their query patterns significantly, making the existing indexes inefficient and overloading the database with many new resource-consuming queries. It wasn’t uncommon for such incidents to require manual intervention where an engineer would remove or create indexes by hand.</p>
</section>
</section>
</section>
<section id="next-gen-indexing-service">
<h3>Next-gen indexing service<a class="headerlink" href="#next-gen-indexing-service" title="Permalink to this headline"> #</a></h3>
<p>The manual operational toil was growing and the metrics queries were slowly getting less performant. It was time to rethink how we index timeseries data at Datadog. The new indexing strategy we came up with was inspired by the core data structure behind search engines, the inverted index. In search engines, the inverted index associates every word in a document with document identifiers that contain the word.</p>
<p>For example:</p>
<pre class="code python literal-block"><code><span class="n">documents</span> <span class="o">=</span> <span class="p">{</span><span class="w">
</span>    <span class="mi">1</span><span class="p">:</span> <span class="s2">&quot;a donut on a glass plate&quot;</span><span class="p">,</span><span class="w">
</span>    <span class="mi">2</span><span class="p">:</span> <span class="s2">&quot;only the donut&quot;</span><span class="p">,</span><span class="w">
</span>    <span class="mi">3</span><span class="p">:</span> <span class="s2">&quot;listen to the drum machine&quot;</span><span class="p">,</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="n">index</span> <span class="o">=</span> <span class="p">{</span><span class="w">
</span>    <span class="s2">&quot;a&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w">
</span>    <span class="s2">&quot;donut&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span><span class="w">
</span>    <span class="s2">&quot;on&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w">
</span>    <span class="s2">&quot;glass&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w">
</span>    <span class="s2">&quot;plate&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w">
</span>    <span class="s2">&quot;only&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">2</span><span class="p">],</span><span class="w">
</span>    <span class="s2">&quot;the&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span><span class="w">
</span>    <span class="s2">&quot;listen&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">3</span><span class="p">],</span><span class="w">
</span>    <span class="s2">&quot;to&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">3</span><span class="p">],</span><span class="w">
</span>    <span class="s2">&quot;drum&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">3</span><span class="p">],</span><span class="w">
</span>    <span class="s2">&quot;machine&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">3</span><span class="p">],</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>A real-world example of the inverted index is an index in a book where a term references a page number:</p>
<figure>
<img alt="" src="https://artem.krylysov.com/images/2020-fts/book-index.png" style="width: 592px;" />
<figcaption>
<p>From Designing Data-Intensive Applications by Martin Kleppmann, 2017</p>
</figcaption>
</figure>
<aside class="admonition note">
<p class="admonition-title">Note</p>
<p>Learn how to build a full-text search engine in my <a class="reference external" href="/blog/2020/07/28/lets-build-a-full-text-search-engine/">previous</a> blog post.</p>
</aside>
<p>Taking the definition of an inverted index for search engines, if we replace the term &quot;document identifier&quot; with &quot;timeseries identifier&quot; and &quot;word&quot; with &quot;tag&quot;, we have a definition for an inverted index for timeseries data. The inverted index associates every tag in a timeseries with the identifiers of timeseries that contain the tag.</p>
<section id="design-1">
<h4>Design<a class="headerlink" href="#design-1" title="Permalink to this headline"> #</a></h4>
<p>The new design required rewriting almost the entire timeseries index service. It gave us an opportunity to reevaluate the tech stack we used. The new approach didn’t require maintaining any metadata, so that removed the SQLite dependency. RocksDB for the original implementation was a solid choice—we didn’t have any issues with it in production—so we kept it as the key-value store for indexes.</p>
<figure>
<img alt="" src="https://artem.krylysov.com/images/2024-timeseries-indexing-at-scale/next-components.png" style="width: 261px;" />
<figcaption>
<p>The next-gen architecture is simpler and with fewer components inside a single node</p>
</figcaption>
</figure>
<p>We’ll use the same six data points from the previous example to see how the next-gen strategy works. The Tagsets and Metrics databases look almost the same as in the original implementation so we won’t go over them again. The major difference between the original and next-gen implementation is the Indexes database, where we now unconditionally index every ingested tag, similarly to what search engines do with the inverted index:</p>
<table>
<thead>
<tr><th class="head"><p>Key (metric;tag)</p></th>
<th class="head"><p>Value (timeseries IDs)</p></th>
</tr>
</thead>
<tbody>
<tr><td><p>cpu.total;env:prod</p></td>
<td><p>1,2,7,8</p></td>
</tr>
<tr><td><p>cpu.total;env:staging</p></td>
<td><p>2,9</p></td>
</tr>
<tr><td><p>cpu.total;service:web</p></td>
<td><p>1,2,3</p></td>
</tr>
<tr><td><p>cpu.total;service:db</p></td>
<td><p>7,8,9</p></td>
</tr>
<tr><td><p>cpu.total;host:i-187</p></td>
<td><p>1</p></td>
</tr>
<tr><td><p>cpu.total;host:i-223</p></td>
<td><p>2</p></td>
</tr>
<tr><td><p>cpu.total;host:i-398</p></td>
<td><p>3</p></td>
</tr>
<tr><td><p>cpu.total;host:i-409</p></td>
<td><p>7</p></td>
</tr>
<tr><td><p>cpu.total;host:i-543</p></td>
<td><p>8</p></td>
</tr>
<tr><td><p>cpu.total;host:i-681</p></td>
<td><p>9</p></td>
</tr>
</tbody>
</table>
<p>The timeseries queries map well to the inverted index. To execute a query, the query engine makes a single key-value lookup for each queried tag and retrieves a set of relevant timeseries identifiers. Then, depending on the query, the sets are combined either by computing a set intersection or a set union.</p>
<p>For example, for the query <span class="docutils literal">cpu.total{service:web AND <span class="pre">host:i-187}</span></span>, we do two key-value lookups from the Indexes databases. The first is to retrieve the <span class="docutils literal">cpu.total;service:web</span> key, which returns the values: <span class="docutils literal">1,2,3</span>. The second key-value lookup is for the key <span class="docutils literal"><span class="pre">cpu.total;host:i-187</span></span>, which returns the value: <span class="docutils literal">1</span>.</p>
<p>To get the final result, we compute the set intersection between the two returned values, <span class="docutils literal">1,2,3</span> and <span class="docutils literal">1</span>, which is <span class="docutils literal">1</span>. For the same query, the previous indexing strategy required a single lookup when the index existed, and seven lookups when there was no index. With the new indexing strategy we get a consistent query performance because the query always requires exactly two lookups.</p>
<section id="challenges-1">
<h5>Challenges<a class="headerlink" href="#challenges-1" title="Permalink to this headline"> #</a></h5>
<p>One downside of this strategy is the write and space amplification. Every unique timeseries identifier has to be stored multiple times, once for each tag. With tens of tags per single timeseries, we have to store each timeseries identifier more than 10 times. Our early tests confirmed the concern that the timeseries index had to write and store noticeably more data on the disk. However, previously the timeseries index barely utilized the disk and was strictly CPU bound, so the disk space utilization wouldn’t become a problem even if we had to start storing an order of magnitude more data.</p>
<p>The write amplification and the increased CPU utilization the new strategy could bring was still a concern. However, it turned out not to be an issue, as we will see.</p>
</section>
<section id="advantages-1">
<h5>Advantages<a class="headerlink" href="#advantages-1" title="Permalink to this headline"> #</a></h5>
<p>The new timeseries index doesn’t rely on the query log or a list of automatically generated indexes. It simplifies the ingestion path and makes it less CPU-intensive because it doesn’t have to match every ingested timeseries against the list of indexes to find which index it belongs to. Same is true for queries: we no longer need to do any index matching because we know that an index for every tag always exists. It also removes the need for running several CPU-consuming background jobs responsible for maintaining the indexes. The timeseries index doesn’t have to scan the query log anymore and we never need to backfill newly created indexes. Overall, while we have to write more data, we now spend less CPU time doing it.</p>
<p>On the query side, on average, indexed queries become slightly more expensive because some queries, which previously required a single key-value lookup, now require multiple lookups. On the other hand, every possible query now always has a partial index available. Since we don’t have to ever fall back to a full table scan, the worst case scenario for query performance improved and became more predictable.</p>
</section>
</section>
<section id="intranode-sharding">
<h4>Intranode Sharding<a class="headerlink" href="#intranode-sharding" title="Permalink to this headline"> #</a></h4>
<p>Another performance issue with the original implementation was that the query path didn’t scale with CPU cores available on a node. No matter how many CPU cores a node had, a single query couldn’t use more than a single core. At some point, the single-core performance would always become a bottleneck for how quickly the service could execute a query. One of the goals of designing the new service was being able to scale ingest and queries with CPU cores. We accomplished this by making each node split RocksDB indexes into multiple isolated instances (shards), each responsible for a subset of timeseries. To ensure the data is distributed evenly across the shards, we hash the ingested timeseries IDs and assign a timeseries to a particular shard based on the hash.</p>
<figure>
<img alt="" src="https://artem.krylysov.com/images/2024-timeseries-indexing-at-scale/db-shards.png" style="width: 501px;" />
<figcaption>
<p>The hashes of four timeseries subsets determines which of the two RocksDB shards the timeseries are stored in</p>
</figcaption>
</figure>
<p>To execute a single query, the service fetches data from each RocksDB shard in parallel and then merges the results.</p>
<figure>
<img alt="" src="https://artem.krylysov.com/images/2024-timeseries-indexing-at-scale/db-shards-query.png" style="width: 221px;" />
<figcaption>
<p>The index query fetches data from two RocksDB shards</p>
</figcaption>
</figure>
<p>After running experiments in production, we settled on creating eight shards on a node with 32 CPU cores. It gave us a nearly 8x performance boost without adding too much overhead from splitting and having to merge the timeseries back. Additionally, the intranode sharding allowed us to switch to larger cloud node types with more CPU cores, reducing the total number of nodes we had to run.</p>
</section>
<section id="switching-to-rust">
<h4>Switching to Rust<a class="headerlink" href="#switching-to-rust" title="Permalink to this headline"> #</a></h4>
<p>While the Go language worked well for most services at Datadog, it wasn’t the best fit for our resource-intensive use case. The service spent nearly 30% of CPU resources on garbage collection, and we reached the point where implementing performance optimizations was very time consuming. We needed a compiled language with no garbage collector. We decided to give Rust a chance, which turned out to be the right choice. To illustrate the performance differences, we’ll compare Go and Rust on two CPU-demanding operations that the indexing service executes.</p>
<p>When grouping timeseries, the service needs to extract tags for relevant tag keys. For example, grouping <span class="docutils literal"><span class="pre">env:prod,service:web,host:i-187</span></span> by <span class="docutils literal">env</span> and <span class="docutils literal">host</span> is expected to produce a group <span class="docutils literal"><span class="pre">env:prod,host:i-187</span></span>. Here is a simplified version of what we run in production:</p>
<pre class="code rust literal-block"><code><span class="k">fn</span><span class="w"> </span><span class="nf">has_group</span><span class="p">(</span><span class="n">tag</span><span class="p">:</span><span class="w"> </span><span class="kp">&amp;</span><span class="kt">str</span><span class="p">,</span><span class="w"> </span><span class="n">group</span><span class="p">:</span><span class="w"> </span><span class="kp">&amp;</span><span class="kt">str</span><span class="p">)</span><span class="w"> </span><span class="p">-&gt;</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="n">tag</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="n">group</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">tag</span><span class="p">.</span><span class="n">as_bytes</span><span class="p">()[</span><span class="n">group</span><span class="p">.</span><span class="n">len</span><span class="p">()]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">b':'</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">tag</span><span class="p">.</span><span class="n">starts_with</span><span class="p">(</span><span class="n">group</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="k">fn</span><span class="w"> </span><span class="nf">group_key</span><span class="p">(</span><span class="n">tags</span><span class="p">:</span><span class="w"> </span><span class="kp">&amp;</span><span class="p">[</span><span class="o">&amp;</span><span class="kt">str</span><span class="p">],</span><span class="w"> </span><span class="n">groups</span><span class="p">:</span><span class="w"> </span><span class="kp">&amp;</span><span class="p">[</span><span class="o">&amp;</span><span class="kt">str</span><span class="p">])</span><span class="w"> </span><span class="p">-&gt;</span><span class="w"> </span><span class="nb">String</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">key_tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">Vec</span><span class="p">::</span><span class="n">with_capacity</span><span class="p">(</span><span class="n">groups</span><span class="p">.</span><span class="n">len</span><span class="p">());</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="n">tag</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">tags</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">for</span><span class="w"> </span><span class="n">group</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">groups</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="k">if</span><span class="w"> </span><span class="n">has_group</span><span class="p">(</span><span class="n">tag</span><span class="p">,</span><span class="w"> </span><span class="n">group</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
                </span><span class="n">key_tags</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="o">*</span><span class="n">tag</span><span class="p">);</span><span class="w">
            </span><span class="p">}</span><span class="w">
        </span><span class="p">}</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="n">key_tags</span><span class="p">.</span><span class="n">sort_unstable</span><span class="p">();</span><span class="w">
    </span><span class="n">key_tags</span><span class="p">.</span><span class="n">dedup</span><span class="p">();</span><span class="w">
    </span><span class="n">key_tags</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="s">&quot;,&quot;</span><span class="p">)</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>And here is a one-to-one translation to Go:</p>
<pre class="code go literal-block"><code><span class="kd">func</span><span class="w"> </span><span class="nx">hasKey</span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">group</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">group</span><span class="p">)</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">s</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">group</span><span class="p">)]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">':'</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">HasPrefix</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">group</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">groupKey</span><span class="p">(</span><span class="nx">tags</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">groups</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">keyTags</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">groups</span><span class="p">))</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">tag</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">tags</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">group</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">groups</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="k">if</span><span class="w"> </span><span class="nx">hasKey</span><span class="p">(</span><span class="nx">tag</span><span class="p">,</span><span class="w"> </span><span class="nx">group</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
                </span><span class="nx">keyTags</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">keyTags</span><span class="p">,</span><span class="w"> </span><span class="nx">tag</span><span class="p">)</span><span class="w">
            </span><span class="p">}</span><span class="w">
        </span><span class="p">}</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="nx">sort</span><span class="p">.</span><span class="nx">Strings</span><span class="p">(</span><span class="nx">keyTags</span><span class="p">)</span><span class="w">
    </span><span class="nx">keyTags</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">slices</span><span class="p">.</span><span class="nx">Compact</span><span class="p">(</span><span class="nx">keyTags</span><span class="p">)</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">keyTags</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;,&quot;</span><span class="p">)</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>While the functions look very similar, our benchmarks on production data on an AWS c7i.xlarge instance (Intel Xeon Platinum 8488C) showed that the Rust version is three times faster than the Go version.</p>
<p>As a part of the ingest and the query paths, the indexing service needs to merge many timeseries IDs together. The IDs are integers, stored as sorted arrays. For example, merging three arrays of IDs <span class="docutils literal">[3,6], [4,5], [1,2]</span> is expected to produce a single sorted array containing a union of all IDs: <span class="docutils literal">[1,2,3,4,5,6]</span>. The problem can be solved with the <a class="reference external" href="https://en.wikipedia.org/wiki/K-way_merge_algorithm" target="_blank">k-way merge</a> algorithm using a <a class="reference external" href="https://en.wikipedia.org/wiki/Heap_(data_structure)" target="_blank">min-heap</a>. Luckily, the Rust and the Go standard libraries come with a heap implementation we can use to implement this. Let’s start with the Rust version, as the implementation is slightly more straightforward:</p>
<pre class="code rust literal-block"><code><span class="cp">#[derive(Eq)]</span><span class="w">
</span><span class="k">struct</span><span class="w"> </span><span class="nc">Item</span><span class="o">&lt;'</span><span class="na">a</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="n">first</span><span class="p">:</span><span class="w"> </span><span class="kt">u64</span><span class="p">,</span><span class="w">
    </span><span class="n">remainder</span><span class="p">:</span><span class="w"> </span><span class="kp">&amp;</span><span class="o">'</span><span class="na">a</span><span class="w"> </span><span class="p">[</span><span class="kt">u64</span><span class="p">],</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="k">impl</span><span class="w"> </span><span class="nb">Ord</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">Item</span><span class="o">&lt;'</span><span class="nb">_</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="k">fn</span><span class="w"> </span><span class="nf">cmp</span><span class="p">(</span><span class="o">&amp;</span><span class="bp">self</span><span class="p">,</span><span class="w"> </span><span class="n">other</span><span class="p">:</span><span class="w"> </span><span class="kp">&amp;</span><span class="nc">Self</span><span class="p">)</span><span class="w"> </span><span class="p">-&gt;</span><span class="w"> </span><span class="nc">Ordering</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="n">other</span><span class="p">.</span><span class="n">first</span><span class="p">.</span><span class="n">cmp</span><span class="p">(</span><span class="o">&amp;</span><span class="bp">self</span><span class="p">.</span><span class="n">first</span><span class="p">)</span><span class="w">
    </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="k">impl</span><span class="w"> </span><span class="nb">PartialOrd</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">Item</span><span class="o">&lt;'</span><span class="nb">_</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="k">fn</span><span class="w"> </span><span class="nf">partial_cmp</span><span class="p">(</span><span class="o">&amp;</span><span class="bp">self</span><span class="p">,</span><span class="w"> </span><span class="n">other</span><span class="p">:</span><span class="w"> </span><span class="kp">&amp;</span><span class="nc">Self</span><span class="p">)</span><span class="w"> </span><span class="p">-&gt;</span><span class="w"> </span><span class="nb">Option</span><span class="o">&lt;</span><span class="n">Ordering</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nb">Some</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">cmp</span><span class="p">(</span><span class="n">other</span><span class="p">))</span><span class="w">
    </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="k">impl</span><span class="w"> </span><span class="nb">PartialEq</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">Item</span><span class="o">&lt;'</span><span class="nb">_</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="k">fn</span><span class="w"> </span><span class="nf">eq</span><span class="p">(</span><span class="o">&amp;</span><span class="bp">self</span><span class="p">,</span><span class="w"> </span><span class="n">other</span><span class="p">:</span><span class="w"> </span><span class="kp">&amp;</span><span class="nc">Self</span><span class="p">)</span><span class="w"> </span><span class="p">-&gt;</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="bp">self</span><span class="p">.</span><span class="n">first</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">other</span><span class="p">.</span><span class="n">first</span><span class="w">
    </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="k">fn</span><span class="w"> </span><span class="nf">merge_u64s</span><span class="p">(</span><span class="n">sets</span><span class="p">:</span><span class="w"> </span><span class="kp">&amp;</span><span class="p">[</span><span class="nb">Vec</span><span class="o">&lt;</span><span class="kt">u64</span><span class="o">&gt;</span><span class="p">])</span><span class="w"> </span><span class="p">-&gt;</span><span class="w"> </span><span class="nb">Vec</span><span class="o">&lt;</span><span class="kt">u64</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">heap</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sets</span><span class="w">
        </span><span class="p">.</span><span class="n">iter</span><span class="p">()</span><span class="w">
        </span><span class="p">.</span><span class="n">map</span><span class="p">(</span><span class="o">|</span><span class="n">set</span><span class="o">|</span><span class="w"> </span><span class="n">Item</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="n">first</span><span class="p">:</span><span class="w"> </span><span class="nc">set</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w">
            </span><span class="n">remainder</span><span class="p">:</span><span class="w"> </span><span class="kp">&amp;</span><span class="nc">set</span><span class="p">[</span><span class="mi">1</span><span class="o">..</span><span class="p">],</span><span class="w">
        </span><span class="p">})</span><span class="w">
        </span><span class="p">.</span><span class="n">collect</span><span class="p">::</span><span class="o">&lt;</span><span class="n">BinaryHeap</span><span class="o">&lt;</span><span class="n">Item</span><span class="o">&gt;&gt;</span><span class="p">();</span><span class="w">
    </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">Vec</span><span class="p">::</span><span class="n">new</span><span class="p">();</span><span class="w">
    </span><span class="c1">// Use peek_mut instead of pop + push to avoid sifting the heap twice.
</span><span class="w">    </span><span class="k">while</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nb">Some</span><span class="p">(</span><span class="k">mut</span><span class="w"> </span><span class="n">item</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">heap</span><span class="p">.</span><span class="n">peek_mut</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="kd">let</span><span class="w"> </span><span class="n">Item</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">first</span><span class="p">,</span><span class="w"> </span><span class="n">remainder</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">&amp;*</span><span class="n">item</span><span class="p">;</span><span class="w">
        </span><span class="k">if</span><span class="w"> </span><span class="n">result</span><span class="p">.</span><span class="n">last</span><span class="p">()</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nb">Some</span><span class="p">(</span><span class="n">first</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="n">result</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="o">*</span><span class="n">first</span><span class="p">);</span><span class="w">
        </span><span class="p">}</span><span class="w">
        </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">remainder</span><span class="p">.</span><span class="n">is_empty</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="o">*</span><span class="n">item</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Item</span><span class="w"> </span><span class="p">{</span><span class="w">
                </span><span class="n">first</span><span class="p">:</span><span class="w"> </span><span class="nc">remainder</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w">
                </span><span class="n">remainder</span><span class="p">:</span><span class="w"> </span><span class="kp">&amp;</span><span class="nc">remainder</span><span class="p">[</span><span class="mi">1</span><span class="o">..</span><span class="p">],</span><span class="w">
            </span><span class="p">};</span><span class="w">
        </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="n">PeekMut</span><span class="p">::</span><span class="n">pop</span><span class="p">(</span><span class="n">item</span><span class="p">);</span><span class="w">
        </span><span class="p">}</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="n">result</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>Merging 100 sets of 10K integers in Rust using the code above takes 33ms.</p>
<p>And here is the Go version:</p>
<pre class="code go literal-block"><code><span class="kd">type</span><span class="w"> </span><span class="nx">Item</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">first</span><span class="w">     </span><span class="kt">uint64</span><span class="w">
    </span><span class="nx">remainder</span><span class="w"> </span><span class="p">[]</span><span class="kt">uint64</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">type</span><span class="w"> </span><span class="nx">Uint64SetHeap</span><span class="w"> </span><span class="p">[]</span><span class="nx">Item</span><span class="w">
</span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">h</span><span class="w"> </span><span class="nx">Uint64SetHeap</span><span class="p">)</span><span class="w"> </span><span class="nx">Len</span><span class="p">()</span><span class="w"> </span><span class="kt">int</span><span class="w">           </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">h</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">h</span><span class="w"> </span><span class="nx">Uint64SetHeap</span><span class="p">)</span><span class="w"> </span><span class="nx">Less</span><span class="p">(</span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">h</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">first</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">h</span><span class="p">[</span><span class="nx">j</span><span class="p">].</span><span class="nx">first</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">h</span><span class="w"> </span><span class="nx">Uint64SetHeap</span><span class="p">)</span><span class="w"> </span><span class="nx">Swap</span><span class="p">(</span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w">      </span><span class="p">{</span><span class="w"> </span><span class="nx">h</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span><span class="w"> </span><span class="nx">h</span><span class="p">[</span><span class="nx">j</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">h</span><span class="p">[</span><span class="nx">j</span><span class="p">],</span><span class="w"> </span><span class="nx">h</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span><span class="w"> </span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">mergeUint64s</span><span class="p">(</span><span class="nx">sets</span><span class="w"> </span><span class="p">[][]</span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="kt">uint64</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">h</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">(</span><span class="nx">Uint64SetHeap</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">sets</span><span class="p">))</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">set</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">sets</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">h</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">h</span><span class="p">,</span><span class="w"> </span><span class="nx">Item</span><span class="p">{</span><span class="nx">first</span><span class="p">:</span><span class="w"> </span><span class="nx">set</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="nx">remainder</span><span class="p">:</span><span class="w"> </span><span class="nx">set</span><span class="p">[</span><span class="mi">1</span><span class="p">:]})</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="nx">heap</span><span class="p">.</span><span class="nx">Init</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">h</span><span class="p">)</span><span class="w">
    </span><span class="kd">var</span><span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="p">[]</span><span class="kt">uint64</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nx">h</span><span class="p">.</span><span class="nx">Len</span><span class="p">()</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">item</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">h</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="w">
        </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">result</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">result</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">result</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">item</span><span class="p">.</span><span class="nx">first</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="nx">result</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">result</span><span class="p">,</span><span class="w"> </span><span class="nx">item</span><span class="p">.</span><span class="nx">first</span><span class="p">)</span><span class="w">
        </span><span class="p">}</span><span class="w">
        </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">item</span><span class="p">.</span><span class="nx">remainder</span><span class="p">)</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="nx">h</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">Item</span><span class="p">{</span><span class="nx">first</span><span class="p">:</span><span class="w"> </span><span class="nx">item</span><span class="p">.</span><span class="nx">remainder</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="nx">remainder</span><span class="p">:</span><span class="w"> </span><span class="nx">item</span><span class="p">.</span><span class="nx">remainder</span><span class="p">[</span><span class="mi">1</span><span class="p">:]}</span><span class="w">
        </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="c1">// No more elements in the set, remove the set from the heap.</span><span class="w">
            </span><span class="nx">n</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">h</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="w">
            </span><span class="nx">h</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">h</span><span class="p">[</span><span class="nx">n</span><span class="p">]</span><span class="w">
            </span><span class="nx">h</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">h</span><span class="p">[:</span><span class="nx">n</span><span class="p">]</span><span class="w">
        </span><span class="p">}</span><span class="w">
        </span><span class="c1">// The value of the head changed. Re-establish the heap ordering.</span><span class="w">
        </span><span class="nx">heap</span><span class="p">.</span><span class="nx">Fix</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">h</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">result</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>We see a similar result again: in Go, merging the same 100 sets, 10K integers each, takes three times longer—101ms. To make this benchmark more fair, however, there is one optimization we can do in Go. If we look closer at the heap implementation in Rust, we notice that the <a class="reference external" href="https://doc.rust-lang.org/std/collections/binary_heap/struct.BinaryHeap.html" target="_blank">BinaryHeap</a> structure in Rust is generic, meaning the generic type <span class="docutils literal">T</span> is replaced with our concrete type <span class="docutils literal">Item</span> during compilation. The Go heap implementation does not use generics: instead, it uses the <a class="reference external" href="https://pkg.go.dev/container/heap#Interface" target="_blank">heap.Interface</a> interface. Interfaces in Go come with an additional cost in runtime and make some optimizations, such as inlining, impossible. Go supports generics since version 1.18, but unfortunately there is no generic heap in the standard library yet (see the GitHub <a class="reference external" href="https://github.com/golang/go/issues/47632" target="_blank">issue</a> discussing adding it). Instead of trying to write a generic heap in Go, we can do what the Rust compiler does for us by hand: copy the <span class="docutils literal">container/heap</span> package and manually replace all instances of <span class="docutils literal">heap.Interface</span> with <span class="docutils literal">Item</span>. There is a lot of code to copy and paste, so I won’t include it here. This new version is faster—it takes 76ms to run, but is still more than twice as slow as the Rust version.</p>
<p>These are not isolated cases, we found several other CPU-demanding operations being faster in Rust. We learned that while in many cases it’s possible to make Go as fast as Rust, writing performance-sensitive code in Go requires a relatively larger time investment and deeper language expertise.</p>
</section>
</section>
<section id="conclusion">
<h3>Conclusion<a class="headerlink" href="#conclusion" title="Permalink to this headline"> #</a></h3>
<p>To summarize the changes we made, we adapted an entirely different indexing strategy: we now always index the timeseries to avoid full scans. We sharded the indexing nodes internally to parallelize query execution and take advantage of larger nodes with more CPU cores. Finally, we rewrote the service from Go to Rust, making CPU-demanding operations up to 6x faster. Combined, these changes allowed us to query 20 times higher cardinality metrics on the same hardware, and we significantly reduced the tail query latency. This resulted in a 99% reduction of query timeouts and made the timeseries index nearly 50% cheaper to run.</p>
<figure>
<img alt="" src="https://artem.krylysov.com/images/2024-timeseries-indexing-at-scale/query-latency.png" style="width: 557px;" />
<figcaption>
<p>The tail query latency graph with the original indexing service in blue and the next-gen service in orange</p>
</figcaption>
</figure>
</section>
]]>
        </content>
    </entry>
    
    <entry>
        <title><![CDATA[How RocksDB works]]></title>
        <link href="https://artem.krylysov.com/blog/2023/04/19/how-rocksdb-works/"/>
        <updated>2023-04-19T00:00:00Z</updated>
        <id>https://artem.krylysov.com/blog/2023/04/19/how-rocksdb-works/</id>
        <content type="html">
            <![CDATA[<section id="introduction">
<h3>Introduction<a class="headerlink" href="#introduction" title="Permalink to this headline"> #</a></h3>
<p>Over the past years, the adoption of RocksDB increased dramatically. It became a standard for embeddable key-value stores.</p>
<p>Today RocksDB runs in production at Meta, <a class="reference external" href="https://blogs.bing.com/Engineering-Blog/october-2021/RocksDB-in-Microsoft-Bing" target="_blank">Microsoft</a>, <a class="reference external" href="https://netflixtechblog.com/application-data-caching-using-ssds-5bf25df851ef" target="_blank">Netflix</a>, <a class="reference external" href="https://eng.uber.com/cherami-message-queue-system/" target="_blank">Uber</a>. At <a class="reference external" href="https://engineering.fb.com/2021/07/22/data-infrastructure/mysql/" target="_blank">Meta</a> RocksDB serves as a storage engine for the MySQL deployment powering the distributed graph database.</p>
<p>Big tech companies are not the only RocksDB users. Several startups were built around RocksDB - <a class="reference external" href="https://www.cockroachlabs.com/" target="_blank">CockroachDB</a>, <a class="reference external" href="https://www.yugabyte.com/" target="_blank">Yugabyte</a>, <a class="reference external" href="https://www.pingcap.com/" target="_blank">PingCAP</a>, <a class="reference external" href="https://rockset.com/" target="_blank">Rockset</a>.</p>
<p>I spent the past 4 years at Datadog building and running services on top of RocksDB in production. In this post, I'll give a high-level overview of how RocksDB works.</p>
</section>
<section id="what-is-rocksdb">
<h3>What is RocksDB<a class="headerlink" href="#what-is-rocksdb" title="Permalink to this headline"> #</a></h3>
<p>RocksDB is an embeddable persistent key-value store. It's a type of database designed to store large amounts of unique keys associated with values. The simple key-value data model can be used to build search indexes, document-oriented databases, SQL databases, caching systems and message brokers.</p>
<p>RocksDB was forked off Google's <a class="reference external" href="https://github.com/google/leveldb" target="_blank">LevelDB</a> in 2012 and optimized to run on servers with SSD drives.
Currently, RocksDB is <a class="reference external" href="https://github.com/facebook/rocksdb" target="_blank">developed</a> and maintained by Meta.</p>
<p>RocksDB is written in C++, so additionally to C and C++, the С bindings allow embedding the library into applications written in other languages such as <a class="reference external" href="https://github.com/rust-rocksdb/rust-rocksdb" target="_blank">Rust</a>, <a class="reference external" href="https://github.com/linxGnu/grocksdb" target="_blank">Go</a> or <a class="reference external" href="https://github.com/facebook/rocksdb/tree/main/java" target="_blank">Java</a>.</p>
<p>If you ever used SQLite, then you already know what an embeddable database is. In the context of databases, and particularly in the context of RocksDB, &quot;embeddable&quot; means:</p>
<ul class="simple">
<li><p>The database doesn't have a standalone process; instead, it's integrated directly into your application as a library, sharing its resources and memory, removing the need for expensive inter-process communication.</p></li>
<li><p>It doesn't come with a built-in server that can be accessed over the network.</p></li>
<li><p>It is not distributed, meaning it does not provide fault tolerance, replication, or sharding mechanisms.</p></li>
</ul>
<p>It is up to the application to implement these features if necessary.</p>
<p>RocksDB stores data as a collection of key-value pairs. Both keys and values are not typed, they are just arbitrary byte arrays. The database provides a low-level interface with a few functions for modifying the state of the collection:</p>
<ul class="simple">
<li><p><span class="docutils literal">put(key, value)</span>: stores a new key-value pair or updates an existing one</p></li>
<li><p><span class="docutils literal">merge(key, value)</span>: combines the new value with the existing value for a given key</p></li>
<li><p><span class="docutils literal">delete(key)</span>: removes a key-value pair from the collection</p></li>
</ul>
<p>Values can be retrieved with point lookups:</p>
<ul class="simple">
<li><p><span class="docutils literal">get(key)</span></p></li>
</ul>
<p>An iterator enables &quot;range scans&quot; - seeking to a specific key and accessing subsequent key-value pairs in order:</p>
<ul class="simple">
<li><p><span class="docutils literal">iterator.seek(key_prefix); <span class="pre">iterator.value();</span> iterator.next()</span></p></li>
</ul>
</section>
<section id="log-structured-merge-tree">
<h3>Log-structured merge-tree<a class="headerlink" href="#log-structured-merge-tree" title="Permalink to this headline"> #</a></h3>
<p>The core data structure behind RocksDB is called the <em>Log-structured merge-tree</em> (LSM-Tree). It's a tree-like structure organized into multiple levels, with data on each level ordered by key. The LSM-tree was primarily designed for write-heavy workloads and was introduced in 1996 in a <a class="reference external" href="http://paperhub.s3.amazonaws.com/18e91eb4db2114a06ea614f0384f2784.pdf" target="_blank">paper</a> under the same name.</p>
<p>The top level of the LSM-Tree is kept in memory and contains the most recently inserted data. The lower levels are stored on disk and are numbered from 0 to N. Level 0 (L0) stores data moved from memory to disk, Level 1 and below store older data. When a level becomes too large, it's merged with the next level, which is typically an order of magnitude larger than the previous one.</p>
<img alt="" src="https://artem.krylysov.com/images/2023-rocksdb/rocksdb-lsm.png" style="width: 421px;" />
<aside class="admonition note">
<p class="admonition-title">Note</p>
<p>I'll be talking specifically about RocksDB, but most of the concepts covered apply to many databases that use LSM-trees under the hood (e.g. Bigtable, HBase, Cassandra, ScyllaDB, LevelDB, MongoDB WiredTiger).</p>
</aside>
<p>To better understand how LSM-trees work, let's take a closer look at the write and read paths.</p>
</section>
<section id="write-path">
<h3>Write path<a class="headerlink" href="#write-path" title="Permalink to this headline"> #</a></h3>
<section id="memtable">
<h4>MemTable<a class="headerlink" href="#memtable" title="Permalink to this headline"> #</a></h4>
<p>The top level of the LSM-tree is known as the <em>MemTable</em>. It's an in-memory buffer that holds keys and values before they are written to disk. All inserts and updates always go through the memtable. This is also true for deletes - rather than modifying key-value pairs in-place, RocksDB marks deleted keys by inserting a tombstone record.</p>
<p>The memtable is configured to have a specific size in bytes. When the memtable becomes full, it is swapped with a new memtable, the old memtable becomes immutable.</p>
<aside class="admonition note">
<p class="admonition-title">Note</p>
<p>The default size of the memtable is 64MB.</p>
</aside>
<p>Let's start by adding a few keys to the database:</p>
<pre class="code go literal-block"><code><span class="nx">db</span><span class="p">.</span><span class="nx">put</span><span class="p">(</span><span class="s">&quot;chipmunk&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;1&quot;</span><span class="p">)</span><span class="w">
</span><span class="nx">db</span><span class="p">.</span><span class="nx">put</span><span class="p">(</span><span class="s">&quot;cat&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;2&quot;</span><span class="p">)</span><span class="w">
</span><span class="nx">db</span><span class="p">.</span><span class="nx">put</span><span class="p">(</span><span class="s">&quot;raccoon&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;3&quot;</span><span class="p">)</span><span class="w">
</span><span class="nx">db</span><span class="p">.</span><span class="nx">put</span><span class="p">(</span><span class="s">&quot;dog&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;4&quot;</span><span class="p">)</span></code></pre>
<img alt="" src="https://artem.krylysov.com/images/2023-rocksdb/rocksdb-memtable.png" style="width: 181px;" />
<p>As you can see, the key-value pairs in the memtable are ordered by the key. Although <em>chipmunk</em> was inserted first, it comes after <em>cat</em> in the memtable due to the sorted order. The ordering is a requirement for supporting range scans and it makes some operations, which I will cover later more efficient.</p>
</section>
<section id="write-ahead-log">
<h4>Write-ahead log<a class="headerlink" href="#write-ahead-log" title="Permalink to this headline"> #</a></h4>
<p>In the event of a process crash or a planned application restart, data stored in the process memory is lost. To prevent data loss and ensure that database updates are durable, RocksDB writes all updates to the <em>Write-ahead log</em> (WAL) on disk, in addition to the memtable. This way the database can replay the log and restore the original state of the memtable on startup.</p>
<p>The WAL is an append-only file, consisting of a sequence of records. Each record contains a key-value pair, a record type (Put/Merge/Delete), and a checksum. The checksum is used to detect data corruptions or partially written records when replaying the log. Unlike in the memtable, records in the WAL are not ordered by key. Instead, they are appended in the order in which they arrive.</p>
<img alt="" src="https://artem.krylysov.com/images/2023-rocksdb/rocksdb-wal.png" style="width: 251px;" />
</section>
<section id="flush">
<h4>Flush<a class="headerlink" href="#flush" title="Permalink to this headline"> #</a></h4>
<p>RocksDB runs a dedicated background thread that persists immutable memtables to disk. As soon as the flush is complete, the immutable memtable and the corresponding WAL are discarded. RocksDB starts writing to a new WAL and a new memtable. Each flush produces a single <em>SST</em> file on L0. The produced files are immutable - they are never modified once written to disk.</p>
<p>The default memtable implementation in RocksDB is based on a <a class="reference external" href="https://en.wikipedia.org/wiki/Skip_list" target="_blank">skip list</a>. The data structure is a linked list with additional layers of links that allow fast search and insertion in sorted order. The ordering makes the flush efficient, allowing the memtable content to be written to disk sequentially by iterating the key-value pairs. Turning random inserts into sequential writes is one of the key ideas behind the LSM-tree design.</p>
<img alt="" src="https://artem.krylysov.com/images/2023-rocksdb/rocksdb-flush.png" style="width: 201px;" />
<aside class="admonition note">
<p class="admonition-title">Note</p>
<p>RocksDB is highly configurable. Like many other components, the memtable implementation can be swapped with an alternative. It's not uncommon to see self-balancing binary search trees used to implement memtables in other LSM-based databases.</p>
</aside>
</section>
<section id="sst">
<h4>SST<a class="headerlink" href="#sst" title="Permalink to this headline"> #</a></h4>
<p>SST files contain key-value pairs that have been flushed from memtable to disk in a format optimized for queries. <em>SST</em> stands for Static Sorted Table (or Sorted String Table in some other databases). This is a block-based file format that organizes data into blocks (the default size target is 4KB). Individual blocks can be compressed with various compression algorithms supported by RocksDB, such as Zlib, BZ2, Snappy, LZ4, or ZSTD. Similar to records in the WAL, blocks contain checksums to detect data corruptions. RocksDB verifies these checksums every time it reads from the disk.</p>
<p>Blocks in an SST file are divided into sections. The first section, the <em>data</em> section, contains an ordered sequence of key-value pairs. This ordering allows delta-encoding of keys, meaning that instead of storing full keys, we can store only the difference between adjacent keys.</p>
<p>While the key-value pairs in an SST file are stored in sorted order, binary search cannot always be applied, particularly when the blocks are compressed, making searching the file inefficient. RocksDB optimizes lookups by adding an index, which is stored in a separate section right after the data section. The index maps the last key in each data block to its corresponding offset on disk. Again, the keys in the index are ordered, allowing us to find a key quickly by performing a binary search. For example, if we are searching for <em>lynx</em>, the index tells us the key might be in the block 2 because <em>lynx</em> comes after <em>chipmunk</em>, but before <em>raccoon</em>.</p>
<img alt="" src="https://artem.krylysov.com/images/2023-rocksdb/rocksdb-sst.png" style="width: 401px;" />
<p>In reality, there is no <em>lynx</em> in the SST file above, but we had to read the block from disk and search it. RocksDB supports enabling a <a class="reference external" href="https://en.wikipedia.org/wiki/Bloom_filter" target="_blank">bloom filter</a> - a space-efficient probabilistic data structure used to test whether an element belongs to a set. It's stored in an optional bloom filter section and makes searching for keys that don't exist faster.</p>
<p>Additionally, there are several other less interesting sections, like the metadata section.</p>
</section>
<section id="compaction">
<h4>Compaction<a class="headerlink" href="#compaction" title="Permalink to this headline"> #</a></h4>
<p>What I described so far is already a functional key-value store. But there are a few challenges that would prevent using it in a production system: space and read amplifications. <em>Space amplification</em> measures the ratio of storage space to the size of the logical data stored. Let's say, if a database needs 2MB of disk space to store key-value pairs that take 1MB, the space amplification is <em>2</em>. Similarly, <em>read amplification</em> measures the number of IO operations to perform a logical read operation. I'll let you figure out what <em>write amplification</em> is as a little exercise.</p>
<p>Now, let's add more keys to the database and remove a few:</p>
<pre class="code python literal-block"><code><span class="n">db</span><span class="o">.</span><span class="n">delete</span><span class="p">(</span><span class="s2">&quot;chipmunk&quot;</span><span class="p">)</span><span class="w">
</span><span class="n">db</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="s2">&quot;cat&quot;</span><span class="p">,</span> <span class="s2">&quot;5&quot;</span><span class="p">)</span><span class="w">
</span><span class="n">db</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="s2">&quot;raccoon&quot;</span><span class="p">,</span> <span class="s2">&quot;6&quot;</span><span class="p">)</span><span class="w">
</span><span class="n">db</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="s2">&quot;zebra&quot;</span><span class="p">,</span> <span class="s2">&quot;7&quot;</span><span class="p">)</span><span class="w">
</span><span class="o">//</span> <span class="n">Flush</span> <span class="n">triggers</span><span class="w">
</span><span class="n">db</span><span class="o">.</span><span class="n">delete</span><span class="p">(</span><span class="s2">&quot;raccoon&quot;</span><span class="p">)</span><span class="w">
</span><span class="n">db</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="s2">&quot;cat&quot;</span><span class="p">,</span> <span class="s2">&quot;8&quot;</span><span class="p">)</span><span class="w">
</span><span class="n">db</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="s2">&quot;zebra&quot;</span><span class="p">,</span> <span class="s2">&quot;9&quot;</span><span class="p">)</span><span class="w">
</span><span class="n">db</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="s2">&quot;duck&quot;</span><span class="p">,</span> <span class="s2">&quot;10&quot;</span><span class="p">)</span></code></pre>
<img alt="" src="https://artem.krylysov.com/images/2023-rocksdb/rocksdb-compaction1.png" style="width: 426px;" />
<p>As we keep writing, the memtables get flushed and the number of SST files on L0 keeps growing:</p>
<ul class="simple">
<li><p>The space taken by deleted or updated keys is never reclaimed. For example, the <em>cat</em> key has three copies, <em>chipmunk</em> and <em>raccoon</em> still take up space on the disk even though they're no longer needed.</p></li>
<li><p>Reads get slower as their cost grows with the number of SST files on L0. Each key lookup requires inspecting every SST file to find the needed key.</p></li>
</ul>
<p>A mechanism called <em>compaction</em> helps to reduce space and read amplification in exchange for increased write amplification. Compaction selects SST files on one level and merges them with SST files on a level below, discarding deleted and overwritten keys. Compactions run in the background on a dedicated thread pool, which allows RocksDB to continue processing read and write requests while compactions are taking place.</p>
<img alt="" src="https://artem.krylysov.com/images/2023-rocksdb/rocksdb-compaction2.png" style="width: 426px;" />
<p><em>Leveled Compaction</em> is the default compaction strategy in RocksDB. With Leveled Compaction, key ranges of SST files on L0 overlap. Levels 1 and below are organized to contain a single sorted key range partitioned into multiple SST files, ensuring that there is no overlap in key ranges within a level. Compaction picks files on a level and merges them with the overlapping range of files on the level below. For example, during an L0 to L1 compaction, if the input files on L0 span the entire key range, the compaction has to pick all files from L0 and all files from L1.</p>
<img alt="" src="https://artem.krylysov.com/images/2023-rocksdb/rocksdb-compaction3.png" style="width: 291px;" />
<p>For this L1 to L2 compaction below, the input file on L1 overlaps with two files on L2, so the compaction is limited only to a subset of files.</p>
<img alt="" src="https://artem.krylysov.com/images/2023-rocksdb/rocksdb-compaction4.png" style="width: 361px;" />
<p>Compaction is triggered when the number of SST files on L0 reaches a certain threshold (4 by default). For L1 and below, compaction is triggered when the size of the entire level exceeds the configured <em>target size</em>. When this happens, it may trigger an L1 to L2 compaction. This way, an L0 to L1 compaction may cascade all the way to the bottommost level. After the compaction ends, RocksDB updates its metadata and removes compacted files from disk.</p>
<aside class="admonition note">
<p class="admonition-title">Note</p>
<p>RocksDB provides other compaction strategies offering different tradeoffs between space, read and write amplification.</p>
</aside>
<p>Remember that keys in SST files are ordered? The ordering guarantee allows merging multiple SST files incrementally with the help of the <a class="reference external" href="https://en.wikipedia.org/wiki/K-way_merge_algorithm" target="_blank">k-way merge algorithm</a>. <em>K-way merge</em> is a generalized version of the <em>two-way merge</em> that works similarly to the merge phase of the <a class="reference external" href="https://en.wikipedia.org/wiki/Merge_sort" target="_blank">merge sort</a>.</p>
</section>
</section>
<section id="read-path">
<h3>Read path<a class="headerlink" href="#read-path" title="Permalink to this headline"> #</a></h3>
<p>With immutable SST files persisted on disk, the read path is less sophisticated than the write path. A key lookup traverses the LSM-tree from the top to the bottom. It starts with the active memtable, descends to L0, and continues to lower levels until it finds the key or runs out of SST files to check.</p>
<p>Here are the lookup steps:</p>
<ol class="arabic simple">
<li><p>Search the active memtable.</p></li>
<li><p>Search immutable memtables.</p></li>
<li><p>Search all SST files on L0 starting from the most recently flushed.</p></li>
<li><p>For L1 and below, find a single SST file that may contain the key and search the file.</p></li>
</ol>
<p>Searching an SST file involves:</p>
<ol class="arabic simple">
<li><p>(optional) Probe the bloom filter.</p></li>
<li><p>Search the index to find the block the key may belong to.</p></li>
<li><p>Read the block and try to find the key there.</p></li>
</ol>
<p>That's it!</p>
<p>Consider this LSM-tree:</p>
<img alt="" src="https://artem.krylysov.com/images/2023-rocksdb/rocksdb-lookup.png" style="width: 431px;" />
<p>Depending on the key, a lookup may end early at any step. For example, looking up <em>cat</em> or <em>chipmunk</em> ends after searching the active memtable. Searching for <em>raccoon</em>, which exists only on Level 1 or <em>manul</em>, which doesn't exist in the LSM-tree at all requires searching the entire tree.</p>
</section>
<section id="merge">
<h3>Merge<a class="headerlink" href="#merge" title="Permalink to this headline"> #</a></h3>
<p>RocksDB provides another feature that touches both read and write paths: the <em>Merge</em> operation. Imagine you store a list of integers in a database. Occasionally you need to extend the list. To modify the list, you read the existing value from the database, update it in memory and then write back the updated value. This is called &quot;Read-Modify-Write&quot; loop:</p>
<pre class="code go literal-block"><code><span class="nx">db</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">open_db</span><span class="p">(</span><span class="nx">path</span><span class="p">)</span><span class="w">

</span><span class="c1">// Read</span><span class="w">
</span><span class="nx">old_val</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">db</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">key</span><span class="p">)</span><span class="w"> </span><span class="c1">// RocksDB stores keys and values as byte arrays. We need to deserialize the value to turn it into a list.</span><span class="w">
</span><span class="nx">old_list</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">deserialize_list</span><span class="p">(</span><span class="nx">old_val</span><span class="p">)</span><span class="w"> </span><span class="c1">// old_list: [1, 2, 3]</span><span class="w">

</span><span class="c1">// Modify</span><span class="w">
</span><span class="nx">new_list</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">old_list</span><span class="p">.</span><span class="nx">extend</span><span class="p">([</span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="mi">6</span><span class="p">])</span><span class="w"> </span><span class="c1">// new_list: [1, 2, 3, 4, 5, 6]</span><span class="w">
</span><span class="nx">new_val</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">serialize_list</span><span class="p">(</span><span class="nx">new_list</span><span class="p">)</span><span class="w">

</span><span class="c1">// Write</span><span class="w">
</span><span class="nx">db</span><span class="p">.</span><span class="nx">put</span><span class="p">(</span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">new_val</span><span class="p">)</span><span class="w">

</span><span class="nx">db</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">key</span><span class="p">)</span><span class="w"> </span><span class="c1">// deserialized value: [1, 2, 3, 4, 5, 6]</span></code></pre>
<p>The approach works, but has some flaws:</p>
<ul class="simple">
<li><p>It's not thread-safe - two different threads may try to update the same key overwriting each other's updates.</p></li>
<li><p>Write amplification - the cost of the update increases as the value gets larger. E.g., appending a single integer to a list of 100 requires reading 100 and writing back 101 integers.</p></li>
</ul>
<p>In addition to the <em>Put</em> and <em>Delete</em> write operations, RocksDB supports a third write operation, <em>Merge</em>, which aims to solve these problems. The Merge operation requires providing a <em>Merge Operator</em> - a user-defined function responsible for combining incremental updates into a single value:</p>
<pre class="code go literal-block"><code><span class="kd">func</span><span class="w"> </span><span class="nx">merge_operator</span><span class="p">(</span><span class="nx">existing_val</span><span class="p">,</span><span class="w"> </span><span class="nx">updates</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">combined_list</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">deserialize_list</span><span class="p">(</span><span class="nx">existing_val</span><span class="p">)</span><span class="w">
        </span><span class="k">for</span><span class="w"> </span><span class="nx">op</span><span class="w"> </span><span class="nx">in</span><span class="w"> </span><span class="nx">updates</span><span class="w"> </span><span class="p">{</span><span class="w">
                </span><span class="nx">combined_list</span><span class="p">.</span><span class="nx">extend</span><span class="p">(</span><span class="nx">op</span><span class="p">)</span><span class="w">
        </span><span class="p">}</span><span class="w">
        </span><span class="k">return</span><span class="w"> </span><span class="nx">serialize_list</span><span class="p">(</span><span class="nx">combined_list</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="nx">db</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">open_db</span><span class="p">(</span><span class="nx">path</span><span class="p">,</span><span class="w"> </span><span class="p">{</span><span class="nx">merge_operator</span><span class="p">:</span><span class="w"> </span><span class="nx">merge_operator</span><span class="p">})</span><span class="w">
</span><span class="c1">// key's value is [1, 2, 3]</span><span class="w">

</span><span class="nx">list_update</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">serialize_list</span><span class="p">([</span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="mi">6</span><span class="p">])</span><span class="w">
</span><span class="nx">db</span><span class="p">.</span><span class="nx">merge</span><span class="p">(</span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">list_update</span><span class="p">)</span><span class="w">

</span><span class="nx">db</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">key</span><span class="p">)</span><span class="w"> </span><span class="c1">// deserialized value: [1, 2, 3, 4, 5, 6]</span></code></pre>
<p>The merge operator above combines incremental updates passed to the <em>Merge</em> calls into a single value. When <em>Merge</em> is called, RocksDB inserts only incremental updates into the memtable and the WAL. Later, during flush and compaction, RocksDB calls the merge operator function to combine the updates into a single large update or a single value whenever it's possible. On a <em>Get</em> call or an iteration, if there are any pending not-compacted updates, the same function is called to return a single combined value to the caller.</p>
<p>Merge is a good fit for write-heavy streaming applications that constantly need to make small updates to the existing values. So, where is the catch? Reads become more expensive - the work done on reads is not saved. Repetitive queries fetching the same keys have to do the same work over and over again until a flush and compaction are triggered. Like almost everything else in RocksDB, the behavior can be tuned by limiting the number of merge operands in the memtable or by reducing the number of SST files in L0.</p>
</section>
<section id="challenges">
<h3>Challenges<a class="headerlink" href="#challenges" title="Permalink to this headline"> #</a></h3>
<p>If the performance is critical for your application, the most challenging aspect of using RocksDB is configuring it appropriately for a specific workload. RocksDB offers numerous configuration options, and tuning them often requires understanding the database internals and diving deep into the RocksDB source code:</p>
<blockquote>
<p>&quot;Unfortunately, configuring RocksDB optimally is not trivial. Even we as RocksDB developers don't fully understand the effect of each configuration change. If you want to fully optimize RocksDB for your workload, we recommend experiments and benchmarking, while keeping an eye on the three amplification factors.&quot;</p>
<p class="attribution">—<a class="reference external" href="https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide" target="_blank">Official RocksDB Tuning Guide</a></p>
</blockquote>
</section>
<section id="final-thoughts">
<h3>Final thoughts<a class="headerlink" href="#final-thoughts" title="Permalink to this headline"> #</a></h3>
<p>Writing a production-grade key-value store from scratch is hard:</p>
<ul class="simple">
<li><p>Hardware and OS can betray you at any moment, dropping or corrupting data.</p></li>
<li><p>Performance optimizations require a large time investment.</p></li>
</ul>
<p>RocksDB solves this allowing you to focus on the business logic instead. This makes RocksDB an excellent building block for databases.</p>
</section>
]]>
        </content>
    </entry>
    
    <entry>
        <title><![CDATA[Let's build a Full-Text Search engine]]></title>
        <link href="https://artem.krylysov.com/blog/2020/07/28/lets-build-a-full-text-search-engine/"/>
        <updated>2020-07-28T00:00:00Z</updated>
        <id>https://artem.krylysov.com/blog/2020/07/28/lets-build-a-full-text-search-engine/</id>
        <content type="html">
            <![CDATA[<p>Full-Text Search is one of those tools people use every day without realizing it. If you ever googled &quot;golang coverage report&quot; or tried to find &quot;indoor wireless camera&quot; on an e-commerce website, you used some kind of full-text search.</p>
<p>Full-Text Search (FTS) is a technique for searching text in a collection of documents. A document can refer to a web page, a newspaper article, an email message, or any structured text.</p>
<p>Today we are going to build our own FTS engine. By the end of this post, we'll be able to search across millions of documents in less than a millisecond. We'll start with simple search queries like &quot;give me all documents that contain the word <em>cat</em>&quot; and we'll extend the engine to support more sophisticated boolean queries.</p>
<aside class="admonition note">
<p class="admonition-title">Note</p>
<p>Most well-known FTS engine is <a class="reference external" href="https://lucene.apache.org/" target="_blank">Lucene</a> (as well as <a class="reference external" href="https://github.com/elastic/elasticsearch" target="_blank">Elasticsearch</a> and Solr built on top of it).</p>
</aside>
<section id="why-fts">
<h3>Why FTS<a class="headerlink" href="#why-fts" title="Permalink to this headline"> #</a></h3>
<p>Before we start writing code, you may ask &quot;can't we just use <em>grep</em> or have a loop that checks if every document contains the word I'm looking for?&quot;. Yes, we can. However, it's not always the best idea.</p>
</section>
<section id="corpus">
<h3>Corpus<a class="headerlink" href="#corpus" title="Permalink to this headline"> #</a></h3>
<p>We are going to search a part of the abstract of English Wikipedia. The latest dump is available at <a class="reference external" href="https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-abstract1.xml.gz" target="_blank">dumps.wikimedia.org</a>. As of today, the file size after decompression is 913 MB. The XML file contains over 600K documents.</p>
<p>Document example:</p>
<pre class="code xml literal-block"><code><span class="nt">&lt;title&gt;</span>Wikipedia:<span class="w"> </span>Kit-Cat<span class="w"> </span>Klock<span class="nt">&lt;/title&gt;</span><span class="w">
</span><span class="nt">&lt;url&gt;</span>https://en.wikipedia.org/wiki/Kit-Cat_Klock<span class="nt">&lt;/url&gt;</span><span class="w">
</span><span class="nt">&lt;abstract&gt;</span>The<span class="w"> </span>Kit-Cat<span class="w"> </span>Klock<span class="w"> </span>is<span class="w"> </span>an<span class="w"> </span>art<span class="w"> </span>deco<span class="w"> </span>novelty<span class="w"> </span>wall<span class="w"> </span>clock<span class="w"> </span>shaped<span class="w"> </span>like<span class="w"> </span>a<span class="w"> </span>grinning<span class="w"> </span>cat<span class="w"> </span>with<span class="w"> </span>cartoon<span class="w"> </span>eyes<span class="w"> </span>that<span class="w"> </span>swivel<span class="w"> </span>in<span class="w"> </span>time<span class="w"> </span>with<span class="w"> </span>its<span class="w"> </span>pendulum<span class="w"> </span>tail.<span class="nt">&lt;/abstract&gt;</span></code></pre>
</section>
<section id="loading-documents">
<h3>Loading documents<a class="headerlink" href="#loading-documents" title="Permalink to this headline"> #</a></h3>
<p>First, we need to load all the documents from the dump. The built-in <span class="docutils literal">encoding/xml</span> package comes very handy:</p>
<pre class="code go literal-block"><code><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
    </span><span class="s">&quot;encoding/xml&quot;</span><span class="w">
    </span><span class="s">&quot;os&quot;</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="kd">type</span><span class="w"> </span><span class="nx">document</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">Title</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="s">`xml:&quot;title&quot;`</span><span class="w">
    </span><span class="nx">URL</span><span class="w">   </span><span class="kt">string</span><span class="w"> </span><span class="s">`xml:&quot;url&quot;`</span><span class="w">
    </span><span class="nx">Text</span><span class="w">  </span><span class="kt">string</span><span class="w"> </span><span class="s">`xml:&quot;abstract&quot;`</span><span class="w">
    </span><span class="nx">ID</span><span class="w">    </span><span class="kt">int</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">loadDocuments</span><span class="p">(</span><span class="nx">path</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="nx">document</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Open</span><span class="p">(</span><span class="nx">path</span><span class="p">)</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">defer</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span><span class="w">

    </span><span class="nx">dec</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">xml</span><span class="p">.</span><span class="nx">NewDecoder</span><span class="p">(</span><span class="nx">f</span><span class="p">)</span><span class="w">
    </span><span class="nx">dump</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">Documents</span><span class="w"> </span><span class="p">[]</span><span class="nx">document</span><span class="w"> </span><span class="s">`xml:&quot;doc&quot;`</span><span class="w">
    </span><span class="p">}{}</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">dec</span><span class="p">.</span><span class="nx">Decode</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">dump</span><span class="p">);</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w">
    </span><span class="p">}</span><span class="w">

    </span><span class="nx">docs</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">dump</span><span class="p">.</span><span class="nx">Documents</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">docs</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">docs</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">ID</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">i</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">docs</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>Every loaded document gets assigned a unique identifier.
To keep things simple, the first loaded document gets assigned ID=0, the second ID=1 and so on.</p>
</section>
<section id="first-attempt">
<h3>First attempt<a class="headerlink" href="#first-attempt" title="Permalink to this headline"> #</a></h3>
<section id="searching-the-content">
<h4>Searching the content<a class="headerlink" href="#searching-the-content" title="Permalink to this headline"> #</a></h4>
<p>Now that we have all documents loaded into memory, we can try to find the ones about cats. At first, let's loop through all documents and check if they contain the substring <em>cat</em>:</p>
<pre class="code go literal-block"><code><span class="kd">func</span><span class="w"> </span><span class="nx">search</span><span class="p">(</span><span class="nx">docs</span><span class="w"> </span><span class="p">[]</span><span class="nx">document</span><span class="p">,</span><span class="w"> </span><span class="nx">term</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="nx">document</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="kd">var</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="p">[]</span><span class="nx">document</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">doc</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">docs</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">if</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Contains</span><span class="p">(</span><span class="nx">doc</span><span class="p">.</span><span class="nx">Text</span><span class="p">,</span><span class="w"> </span><span class="nx">term</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="nx">r</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">doc</span><span class="p">)</span><span class="w">
        </span><span class="p">}</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">r</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>On my laptop, the search phase takes 103ms - not too bad.
If you spot check a few documents from the output, you may notice that the function matches <em>caterpillar</em> and <em>category</em>, but doesn't match <em>Cat</em> with the capital <em>C</em>. That's not quite what I was looking for.</p>
<p>We need to fix two things before moving forward:</p>
<ul class="simple">
<li><p>Make the search case-insensitive (so <em>Cat</em> matches as well).</p></li>
<li><p>Match on a word boundary rather than on a substring (so <em>caterpillar</em> and <em>communication</em> don't match).</p></li>
</ul>
</section>
<section id="searching-with-regular-expressions">
<h4>Searching with regular expressions<a class="headerlink" href="#searching-with-regular-expressions" title="Permalink to this headline"> #</a></h4>
<p>One solution that quickly comes to mind and allows implementing both requirements is <em>regular expressions</em>.</p>
<p>Here it is - <span class="docutils literal"><span class="pre">(?i)\bcat\b</span></span>:</p>
<ul class="simple">
<li><p><span class="docutils literal"><span class="pre">(?i)</span></span> makes the regex case-insensitive</p></li>
<li><p><span class="docutils literal">\b</span> matches a word boundary (position where one side is a word character and another side is not a word character)</p></li>
</ul>
<pre class="code go literal-block"><code><span class="kd">func</span><span class="w"> </span><span class="nx">search</span><span class="p">(</span><span class="nx">docs</span><span class="w"> </span><span class="p">[]</span><span class="nx">document</span><span class="p">,</span><span class="w"> </span><span class="nx">term</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="nx">document</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">re</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">regexp</span><span class="p">.</span><span class="nx">MustCompile</span><span class="p">(</span><span class="s">`(?i)\b`</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">term</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">`\b`</span><span class="p">)</span><span class="w"> </span><span class="c1">// Don't do this in production, it's a security risk. term needs to be sanitized.</span><span class="w">
    </span><span class="kd">var</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="p">[]</span><span class="nx">document</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">doc</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">docs</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">if</span><span class="w"> </span><span class="nx">re</span><span class="p">.</span><span class="nx">MatchString</span><span class="p">(</span><span class="nx">doc</span><span class="p">.</span><span class="nx">Text</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="nx">r</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">doc</span><span class="p">)</span><span class="w">
        </span><span class="p">}</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">r</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>Ugh, the search took more than 2 seconds. As you can see, things started getting slow even with 600K documents. While the approach is easy to implement, it doesn't scale well. As the dataset grows larger, we need to scan more and more documents. The time complexity of this algorithm is linear - the number of documents required to scan is equal to the total number of documents. If we had 6M documents instead of 600K, the search would take 20 seconds. We need to do better than that.</p>
</section>
</section>
<section id="inverted-index">
<h3>Inverted Index<a class="headerlink" href="#inverted-index" title="Permalink to this headline"> #</a></h3>
<p>To make search queries faster, we'll preprocess the text and build an index in advance.</p>
<p>The core of FTS is a data structure called <em>Inverted Index</em>.
The Inverted Index associates every word in documents with documents that contain the word.</p>
<p>Example:</p>
<pre class="code python literal-block"><code><span class="n">documents</span> <span class="o">=</span> <span class="p">{</span><span class="w">
</span>    <span class="mi">1</span><span class="p">:</span> <span class="s2">&quot;a donut on a glass plate&quot;</span><span class="p">,</span><span class="w">
</span>    <span class="mi">2</span><span class="p">:</span> <span class="s2">&quot;only the donut&quot;</span><span class="p">,</span><span class="w">
</span>    <span class="mi">3</span><span class="p">:</span> <span class="s2">&quot;listen to the drum machine&quot;</span><span class="p">,</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="n">index</span> <span class="o">=</span> <span class="p">{</span><span class="w">
</span>    <span class="s2">&quot;a&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w">
</span>    <span class="s2">&quot;donut&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span><span class="w">
</span>    <span class="s2">&quot;on&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w">
</span>    <span class="s2">&quot;glass&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w">
</span>    <span class="s2">&quot;plate&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w">
</span>    <span class="s2">&quot;only&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">2</span><span class="p">],</span><span class="w">
</span>    <span class="s2">&quot;the&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span><span class="w">
</span>    <span class="s2">&quot;listen&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">3</span><span class="p">],</span><span class="w">
</span>    <span class="s2">&quot;to&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">3</span><span class="p">],</span><span class="w">
</span>    <span class="s2">&quot;drum&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">3</span><span class="p">],</span><span class="w">
</span>    <span class="s2">&quot;machine&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mi">3</span><span class="p">],</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>Below is a real-world example of the Inverted Index. An index in a book where a term references a page number:</p>
<img alt="" src="https://artem.krylysov.com/images/2020-fts/book-index.png" style="width: 592px;" />
</section>
<section id="text-analysis">
<h3>Text analysis<a class="headerlink" href="#text-analysis" title="Permalink to this headline"> #</a></h3>
<p>Before we start building the index, we need to break the raw text down into a list of words (tokens) suitable for indexing and searching.</p>
<p>The text analyzer consists of a tokenizer and multiple filters.</p>
<img alt="" src="https://artem.krylysov.com/images/2020-fts/text-analysis.png" style="width: 530px;" />
</section>
<section id="tokenizer">
<h3>Tokenizer<a class="headerlink" href="#tokenizer" title="Permalink to this headline"> #</a></h3>
<p>The tokenizer is the first step of text analysis. Its job is to convert text into a list of tokens. Our implementation splits the text on a word boundary and removes punctuation marks:</p>
<pre class="code go literal-block"><code><span class="kd">func</span><span class="w"> </span><span class="nx">tokenize</span><span class="p">(</span><span class="nx">text</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">FieldsFunc</span><span class="p">(</span><span class="nx">text</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="kt">rune</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="c1">// Split on any character that is not a letter or a number.</span><span class="w">
        </span><span class="k">return</span><span class="w"> </span><span class="p">!</span><span class="nx">unicode</span><span class="p">.</span><span class="nx">IsLetter</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="p">!</span><span class="nx">unicode</span><span class="p">.</span><span class="nx">IsNumber</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span><span class="w">
    </span><span class="p">})</span><span class="w">
</span><span class="p">}</span></code></pre>
<pre class="code go literal-block"><code><span class="p">&gt;</span><span class="w"> </span><span class="nx">tokenize</span><span class="p">(</span><span class="s">&quot;A donut on a glass plate. Only the donuts.&quot;</span><span class="p">)</span><span class="w">

</span><span class="p">[</span><span class="s">&quot;A&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;donut&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;on&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;a&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;glass&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;plate&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Only&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;the&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;donuts&quot;</span><span class="p">]</span></code></pre>
</section>
<section id="filters">
<h3>Filters<a class="headerlink" href="#filters" title="Permalink to this headline"> #</a></h3>
<p>In most cases, just converting text into a list of tokens is not enough. To make the text easier to index and search, we'll need to do additional normalization.</p>
<section id="lowercase">
<h4>Lowercase<a class="headerlink" href="#lowercase" title="Permalink to this headline"> #</a></h4>
<p>In order to make the search case-insensitive, the lowercase filter converts tokens to lower case. <em>cAt</em>, <em>Cat</em> and <em>caT</em> are normalized to <em>cat</em>.
Later, when we query the index, we'll lower case the search terms as well. This will make the search term <em>cAt</em> match the text <em>Cat</em>.</p>
<pre class="code go literal-block"><code><span class="kd">func</span><span class="w"> </span><span class="nx">lowercaseFilter</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">r</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">token</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">tokens</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">r</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">ToLower</span><span class="p">(</span><span class="nx">token</span><span class="p">)</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">r</span><span class="w">
</span><span class="p">}</span></code></pre>
<pre class="code go literal-block"><code><span class="p">&gt;</span><span class="w"> </span><span class="nx">lowercaseFilter</span><span class="p">([]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;A&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;donut&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;on&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;a&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;glass&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;plate&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Only&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;the&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;donuts&quot;</span><span class="p">})</span><span class="w">

</span><span class="p">[</span><span class="s">&quot;a&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;donut&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;on&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;a&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;glass&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;plate&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;only&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;the&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;donuts&quot;</span><span class="p">]</span></code></pre>
</section>
<section id="dropping-common-words">
<h4>Dropping common words<a class="headerlink" href="#dropping-common-words" title="Permalink to this headline"> #</a></h4>
<p>Almost any English text contains commonly used words like <em>a</em>, <em>I</em>, <em>the</em> or <em>be</em>. Such words are called <em>stop words</em>. We are going to remove them since almost any document would match the stop words.</p>
<p>There is no &quot;official&quot; list of stop words. Let's exclude the top 10 by the <a class="reference external" href="https://en.wikipedia.org/wiki/Most_common_words_in_English" target="_blank">OEC rank</a>. Feel free to add more:</p>
<pre class="code go literal-block"><code><span class="kd">var</span><span class="w"> </span><span class="nx">stopwords</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kd">struct</span><span class="p">{}{</span><span class="w"> </span><span class="c1">// I wish Go had built-in sets.</span><span class="w">
    </span><span class="s">&quot;a&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">{},</span><span class="w"> </span><span class="s">&quot;and&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">{},</span><span class="w"> </span><span class="s">&quot;be&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">{},</span><span class="w"> </span><span class="s">&quot;have&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">{},</span><span class="w"> </span><span class="s">&quot;i&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">{},</span><span class="w">
    </span><span class="s">&quot;in&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">{},</span><span class="w"> </span><span class="s">&quot;of&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">{},</span><span class="w"> </span><span class="s">&quot;that&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">{},</span><span class="w"> </span><span class="s">&quot;the&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">{},</span><span class="w"> </span><span class="s">&quot;to&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">{},</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">stopwordFilter</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">r</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">token</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">tokens</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">if</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stopwords</span><span class="p">[</span><span class="nx">token</span><span class="p">];</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="nx">r</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w">
        </span><span class="p">}</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">r</span><span class="w">
</span><span class="p">}</span></code></pre>
<pre class="code go literal-block"><code><span class="p">&gt;</span><span class="w"> </span><span class="nx">stopwordFilter</span><span class="p">([]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;a&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;donut&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;on&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;a&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;glass&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;plate&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;only&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;the&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;donuts&quot;</span><span class="p">})</span><span class="w">

</span><span class="p">[</span><span class="s">&quot;donut&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;on&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;glass&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;plate&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;only&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;donuts&quot;</span><span class="p">]</span></code></pre>
</section>
<section id="stemming">
<h4>Stemming<a class="headerlink" href="#stemming" title="Permalink to this headline"> #</a></h4>
<p>Because of the grammar rules, documents may include different forms of the same word.
Stemming reduces words into their base form. For example, <em>fishing</em>, <em>fished</em> and <em>fisher</em> may be reduced to the base form (stem) <em>fish</em>.</p>
<p>Implementing a stemmer is a non-trivial task, it's not covered in this post. We'll take one of the <a class="reference external" href="https://github.com/kljensen/snowball" target="_blank">existing</a> modules:</p>
<pre class="code go literal-block"><code><span class="kn">import</span><span class="w"> </span><span class="nx">snowballeng</span><span class="w"> </span><span class="s">&quot;github.com/kljensen/snowball/english&quot;</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">stemmerFilter</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">r</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">token</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">tokens</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">r</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">snowballeng</span><span class="p">.</span><span class="nx">Stem</span><span class="p">(</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">)</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">r</span><span class="w">
</span><span class="p">}</span></code></pre>
<pre class="code go literal-block"><code><span class="p">&gt;</span><span class="w"> </span><span class="nx">stemmerFilter</span><span class="p">([]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;donut&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;on&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;glass&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;plate&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;only&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;donuts&quot;</span><span class="p">})</span><span class="w">

</span><span class="p">[</span><span class="s">&quot;donut&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;on&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;glass&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;plate&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;only&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;donut&quot;</span><span class="p">]</span></code></pre>
<aside class="admonition note">
<p class="admonition-title">Note</p>
<p>A stem is not always a valid word. For example, some stemmers may reduce <em>airline</em> to <em>airlin</em>.</p>
</aside>
</section>
</section>
<section id="putting-the-analyzer-together">
<h3>Putting the analyzer together<a class="headerlink" href="#putting-the-analyzer-together" title="Permalink to this headline"> #</a></h3>
<pre class="code go literal-block"><code><span class="kd">func</span><span class="w"> </span><span class="nx">analyze</span><span class="p">(</span><span class="nx">text</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">tokens</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokenize</span><span class="p">(</span><span class="nx">text</span><span class="p">)</span><span class="w">
    </span><span class="nx">tokens</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">lowercaseFilter</span><span class="p">(</span><span class="nx">tokens</span><span class="p">)</span><span class="w">
    </span><span class="nx">tokens</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">stopwordFilter</span><span class="p">(</span><span class="nx">tokens</span><span class="p">)</span><span class="w">
    </span><span class="nx">tokens</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">stemmerFilter</span><span class="p">(</span><span class="nx">tokens</span><span class="p">)</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">tokens</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>The tokenizer and filters convert sentences into a list of tokens:</p>
<pre class="code go literal-block"><code><span class="p">&gt;</span><span class="w"> </span><span class="nx">analyze</span><span class="p">(</span><span class="s">&quot;A donut on a glass plate. Only the donuts.&quot;</span><span class="p">)</span><span class="w">

</span><span class="p">[</span><span class="s">&quot;donut&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;on&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;glass&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;plate&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;only&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;donut&quot;</span><span class="p">]</span></code></pre>
<p>The tokens are ready for indexing.</p>
</section>
<section id="building-the-index">
<h3>Building the index<a class="headerlink" href="#building-the-index" title="Permalink to this headline"> #</a></h3>
<p>Back to the inverted index. It maps every word in documents to document IDs.
The built-in <span class="docutils literal">map</span> is a good candidate for storing the mapping.
The key in the map is a token (string) and the value is a list of document IDs:</p>
<pre class="code go literal-block"><code><span class="kd">type</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">][]</span><span class="kt">int</span></code></pre>
<p>Building the index consists of analyzing the documents and adding their IDs to the map:</p>
<pre class="code go literal-block"><code><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">idx</span><span class="w"> </span><span class="nx">index</span><span class="p">)</span><span class="w"> </span><span class="nx">add</span><span class="p">(</span><span class="nx">docs</span><span class="w"> </span><span class="p">[]</span><span class="nx">document</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">doc</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">docs</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">token</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">analyze</span><span class="p">(</span><span class="nx">doc</span><span class="p">.</span><span class="nx">Text</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="nx">ids</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">idx</span><span class="p">[</span><span class="nx">token</span><span class="p">]</span><span class="w">
            </span><span class="k">if</span><span class="w"> </span><span class="nx">ids</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">ids</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">ids</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">doc</span><span class="p">.</span><span class="nx">ID</span><span class="w"> </span><span class="p">{</span><span class="w">
                </span><span class="c1">// Don't add same ID twice.</span><span class="w">
                </span><span class="k">continue</span><span class="w">
            </span><span class="p">}</span><span class="w">
            </span><span class="nx">idx</span><span class="p">[</span><span class="nx">token</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">ids</span><span class="p">,</span><span class="w"> </span><span class="nx">doc</span><span class="p">.</span><span class="nx">ID</span><span class="p">)</span><span class="w">
        </span><span class="p">}</span><span class="w">
    </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">idx</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">(</span><span class="nx">index</span><span class="p">)</span><span class="w">
    </span><span class="nx">idx</span><span class="p">.</span><span class="nx">add</span><span class="p">([]</span><span class="nx">document</span><span class="p">{{</span><span class="nx">ID</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="nx">Text</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;A donut on a glass plate. Only the donuts.&quot;</span><span class="p">}})</span><span class="w">
    </span><span class="nx">idx</span><span class="p">.</span><span class="nx">add</span><span class="p">([]</span><span class="nx">document</span><span class="p">{{</span><span class="nx">ID</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="nx">Text</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;donut is a donut&quot;</span><span class="p">}})</span><span class="w">
    </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">idx</span><span class="p">)</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>It works! Each token in the map refers to IDs of the documents that contain the token:</p>
<pre class="code literal-block"><code>map[donut:[1 2] glass:[1] is:[2] on:[1] only:[1] plate:[1]]</code></pre>
</section>
<section id="querying">
<h3>Querying<a class="headerlink" href="#querying" title="Permalink to this headline"> #</a></h3>
<p>To query the index, we are going to apply the same tokenizer and filters we used for indexing:</p>
<pre class="code go literal-block"><code><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">idx</span><span class="w"> </span><span class="nx">index</span><span class="p">)</span><span class="w"> </span><span class="nx">search</span><span class="p">(</span><span class="nx">text</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">[][]</span><span class="kt">int</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="kd">var</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="p">[][]</span><span class="kt">int</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">token</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">analyze</span><span class="p">(</span><span class="nx">text</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">if</span><span class="w"> </span><span class="nx">ids</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">idx</span><span class="p">[</span><span class="nx">token</span><span class="p">];</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="nx">r</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">ids</span><span class="p">)</span><span class="w">
        </span><span class="p">}</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">r</span><span class="w">
</span><span class="p">}</span></code></pre>
<pre class="code go literal-block"><code><span class="p">&gt;</span><span class="w"> </span><span class="nx">idx</span><span class="p">.</span><span class="nx">search</span><span class="p">(</span><span class="s">&quot;Small wild cat&quot;</span><span class="p">)</span><span class="w">

</span><span class="p">[[</span><span class="mi">24</span><span class="p">,</span><span class="w"> </span><span class="mi">173</span><span class="p">,</span><span class="w"> </span><span class="mi">303</span><span class="p">,</span><span class="w"> </span><span class="o">...</span><span class="p">],</span><span class="w"> </span><span class="p">[</span><span class="mi">98</span><span class="p">,</span><span class="w"> </span><span class="mi">173</span><span class="p">,</span><span class="w"> </span><span class="mi">765</span><span class="p">,</span><span class="w"> </span><span class="o">...</span><span class="p">],</span><span class="w"> </span><span class="p">[[</span><span class="mi">24</span><span class="p">,</span><span class="w"> </span><span class="mi">51</span><span class="p">,</span><span class="w"> </span><span class="mi">173</span><span class="p">,</span><span class="w"> </span><span class="o">...</span><span class="p">]]</span></code></pre>
<p>And finally, we can find all documents that mention cats. Searching 600K documents took less than a millisecond (18µs)!</p>
<p>With the inverted index, the time complexity of the search query is linear to the number of search tokens. In the example query above, other than analyzing the input text, <span class="docutils literal">search</span> had to perform only three map lookups.</p>
</section>
<section id="boolean-queries">
<h3>Boolean queries<a class="headerlink" href="#boolean-queries" title="Permalink to this headline"> #</a></h3>
<p>The query from the previous section returned a disjoined list of documents for each token.
What we normally expect to find when we type <em>small wild cat</em> in a search box is a list of results that contain <em>small</em>, <em>wild</em> and <em>cat</em> at the same time. The next step is to compute the set intersection between the lists. This way we'll get a list of documents matching all tokens.</p>
<img alt="" src="https://artem.krylysov.com/images/2020-fts/venn.png" style="width: 313px;" />
<p>Luckily, IDs in our inverted index are inserted in ascending order. Since the IDs are sorted, it's possible to
compute the intersection between two lists in linear time. The <span class="docutils literal">intersection</span> function iterates two lists simultaneously and collects IDs that exist
in both:</p>
<pre class="code go literal-block"><code><span class="kd">func</span><span class="w"> </span><span class="nx">intersection</span><span class="p">(</span><span class="nx">a</span><span class="w"> </span><span class="p">[]</span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="p">[]</span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="kt">int</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">maxLen</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">a</span><span class="p">)</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">b</span><span class="p">)</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="nx">maxLen</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">maxLen</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">b</span><span class="p">)</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="nx">r</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">maxLen</span><span class="p">)</span><span class="w">
    </span><span class="kd">var</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="kt">int</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">a</span><span class="p">)</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">b</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">if</span><span class="w"> </span><span class="nx">a</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">b</span><span class="p">[</span><span class="nx">j</span><span class="p">]</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="nx">i</span><span class="o">++</span><span class="w">
        </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">a</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="nx">b</span><span class="p">[</span><span class="nx">j</span><span class="p">]</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="nx">j</span><span class="o">++</span><span class="w">
        </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="nx">r</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="p">[</span><span class="nx">i</span><span class="p">])</span><span class="w">
            </span><span class="nx">i</span><span class="o">++</span><span class="w">
            </span><span class="nx">j</span><span class="o">++</span><span class="w">
        </span><span class="p">}</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">r</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>Updated <span class="docutils literal">search</span> analyzes the given query text, lookups tokens and computes the set intersection between lists of IDs:</p>
<pre class="code go literal-block"><code><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">idx</span><span class="w"> </span><span class="nx">index</span><span class="p">)</span><span class="w"> </span><span class="nx">search</span><span class="p">(</span><span class="nx">text</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="kt">int</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="kd">var</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="p">[]</span><span class="kt">int</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">token</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">analyze</span><span class="p">(</span><span class="nx">text</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">if</span><span class="w"> </span><span class="nx">ids</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">idx</span><span class="p">[</span><span class="nx">token</span><span class="p">];</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="k">if</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
                </span><span class="nx">r</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">ids</span><span class="w">
            </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w">
                </span><span class="nx">r</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">intersection</span><span class="p">(</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">ids</span><span class="p">)</span><span class="w">
            </span><span class="p">}</span><span class="w">
        </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="c1">// Token doesn't exist.</span><span class="w">
            </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="w">
        </span><span class="p">}</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">r</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>The Wikipedia dump contains only two documents that match <em>small</em>, <em>wild</em> and <em>cat</em> at the same time:</p>
<pre class="code go literal-block"><code><span class="p">&gt;</span><span class="w"> </span><span class="nx">idx</span><span class="p">.</span><span class="nx">search</span><span class="p">(</span><span class="s">&quot;Small wild cat&quot;</span><span class="p">)</span><span class="w">

</span><span class="mi">130764</span><span class="w">  </span><span class="nx">The</span><span class="w"> </span><span class="nx">wildcat</span><span class="w"> </span><span class="nx">is</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="nx">species</span><span class="w"> </span><span class="nx">complex</span><span class="w"> </span><span class="nx">comprising</span><span class="w"> </span><span class="nx">two</span><span class="w"> </span><span class="nx">small</span><span class="w"> </span><span class="nx">wild</span><span class="w"> </span><span class="nx">cat</span><span class="w"> </span><span class="nx">species</span><span class="p">,</span><span class="w"> </span><span class="nx">the</span><span class="w"> </span><span class="nx">European</span><span class="w"> </span><span class="nx">wildcat</span><span class="w"> </span><span class="p">(</span><span class="nx">Felis</span><span class="w"> </span><span class="nx">silvestris</span><span class="p">)</span><span class="w"> </span><span class="nx">and</span><span class="w"> </span><span class="nx">the</span><span class="w"> </span><span class="nx">African</span><span class="w"> </span><span class="nx">wildcat</span><span class="w"> </span><span class="p">(</span><span class="nx">F</span><span class="p">.</span><span class="w"> </span><span class="nx">lybica</span><span class="p">).</span><span class="w">
</span><span class="mi">131692</span><span class="w">  </span><span class="nx">Catopuma</span><span class="w"> </span><span class="nx">is</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="nx">genus</span><span class="w"> </span><span class="nx">containing</span><span class="w"> </span><span class="nx">two</span><span class="w"> </span><span class="nx">Asian</span><span class="w"> </span><span class="nx">small</span><span class="w"> </span><span class="nx">wild</span><span class="w"> </span><span class="nx">cat</span><span class="w"> </span><span class="nx">species</span><span class="p">,</span><span class="w"> </span><span class="nx">the</span><span class="w"> </span><span class="nx">Asian</span><span class="w"> </span><span class="nx">golden</span><span class="w"> </span><span class="nx">cat</span><span class="w"> </span><span class="p">(</span><span class="nx">C</span><span class="p">.</span><span class="w"> </span><span class="nx">temminckii</span><span class="p">)</span><span class="w"> </span><span class="nx">and</span><span class="w"> </span><span class="nx">the</span><span class="w"> </span><span class="nx">bay</span><span class="w"> </span><span class="nx">cat</span><span class="p">.</span></code></pre>
<p>The search is working as expected!</p>
<p>By the way, this is the first time I hear about <em>catopuma</em>, here is one of them:</p>
<img alt="By Karen Stout - originally posted to Flickr as Asian Golden cat, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=11413240" src="https://artem.krylysov.com/images/2020-fts/asian-golden-cat-s.jpg" style="width: 300px;" />
</section>
<section id="conclusions">
<h3>Conclusions<a class="headerlink" href="#conclusions" title="Permalink to this headline"> #</a></h3>
<p>We just built a Full-Text Search engine. Despite its simplicity, it can be a solid foundation for more advanced projects.</p>
<p>I didn't touch on a lot of things that can significantly improve the performance and make the engine more user friendly.
Here are some ideas for further improvements:</p>
<ul class="simple">
<li><p>Extend boolean queries to support <em>OR</em> and <em>NOT</em>.</p></li>
<li><p>Store the index on disk:</p>
<ul>
<li><p>Rebuilding the index on every application restart may take a while.</p></li>
<li><p>Large indexes may not fit in memory.</p></li>
</ul>
</li>
<li><p>Experiment with memory and CPU-efficient data formats for storing sets of document IDs. Take a look at <a class="reference external" href="https://roaringbitmap.org/" target="_blank">Roaring Bitmaps</a>.</p></li>
<li><p>Support indexing multiple document fields.</p></li>
<li><p>Sort results by relevance.</p></li>
</ul>
<p>The full source code is available on <a class="reference external" href="https://github.com/akrylysov/simplefts" target="_blank">GitHub</a>.</p>
</section>
]]>
        </content>
    </entry>
    
    <entry>
        <title><![CDATA[String interning in Go]]></title>
        <link href="https://artem.krylysov.com/blog/2018/12/12/string-interning-in-go/"/>
        <updated>2018-12-12T00:00:00Z</updated>
        <id>https://artem.krylysov.com/blog/2018/12/12/string-interning-in-go/</id>
        <content type="html">
            <![CDATA[<p>String interning is a technique of storing only one copy of each unique string in memory. It can significantly reduce memory usage for applications that store many duplicated strings.</p>
<p>The built-in <span class="docutils literal">string</span> is represented internally as a structure containing two fields. <span class="docutils literal">Data</span> is a pointer to the string data and <span class="docutils literal">Len</span> is a length of the string:</p>
<pre class="code go literal-block"><code><span class="kd">type</span><span class="w"> </span><span class="nx">StringHeader</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">Data</span><span class="w"> </span><span class="kt">uintptr</span><span class="w">
        </span><span class="nx">Len</span><span class="w">  </span><span class="kt">int</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>In Go strings are immutable, so multiple strings can share the same underlying data:</p>
<pre class="code go literal-block"><code><span class="kn">package</span><span class="w"> </span><span class="nx">main</span><span class="w">

</span><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
    </span><span class="s">&quot;fmt&quot;</span><span class="w">
    </span><span class="s">&quot;reflect&quot;</span><span class="w">
    </span><span class="s">&quot;unsafe&quot;</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="c1">// stringptr returns a pointer to the string data.</span><span class="w">
</span><span class="kd">func</span><span class="w"> </span><span class="nx">stringptr</span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">uintptr</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">reflect</span><span class="p">.</span><span class="nx">StringHeader</span><span class="p">)(</span><span class="nx">unsafe</span><span class="p">.</span><span class="nx">Pointer</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">s</span><span class="p">)).</span><span class="nx">Data</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">s1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">&quot;1234&quot;</span><span class="w">
    </span><span class="nx">s2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s1</span><span class="p">[:</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="c1">// &quot;12&quot;</span><span class="w">
    </span><span class="c1">// s1 and s2 are different strings, but they point to the same string data</span><span class="w">
    </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">stringptr</span><span class="p">(</span><span class="nx">s1</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">stringptr</span><span class="p">(</span><span class="nx">s2</span><span class="p">))</span><span class="w"> </span><span class="c1">// true</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>Most of modern programming languages including Go intern compile-time string constants:</p>
<pre class="code go literal-block"><code><span class="nx">s1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">&quot;12&quot;</span><span class="w">
</span><span class="nx">s2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">&quot;1&quot;</span><span class="o">+</span><span class="s">&quot;2&quot;</span><span class="w">
</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">stringptr</span><span class="p">(</span><span class="nx">s1</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">stringptr</span><span class="p">(</span><span class="nx">s2</span><span class="p">))</span><span class="w"> </span><span class="c1">// true</span></code></pre>
<p>But strings generated at runtime are not interned:</p>
<pre class="code go literal-block"><code><span class="nx">s1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">&quot;12&quot;</span><span class="w">
</span><span class="nx">s2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">Itoa</span><span class="p">(</span><span class="mi">12</span><span class="p">)</span><span class="w">
</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">stringptr</span><span class="p">(</span><span class="nx">s1</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">stringptr</span><span class="p">(</span><span class="nx">s2</span><span class="p">))</span><span class="w"> </span><span class="c1">// false</span></code></pre>
<section id="implementation">
<h3>Implementation<a class="headerlink" href="#implementation" title="Permalink to this headline"> #</a></h3>
<p>To implement string interning we need a data structure representing a pool of strings. The pool needs to support two operations: adding a string to the pool and retrieving a string from the pool. A good candidate for the requirements is a hash map:</p>
<pre class="code go literal-block"><code><span class="kn">package</span><span class="w"> </span><span class="nx">main</span><span class="w">

</span><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
    </span><span class="s">&quot;fmt&quot;</span><span class="w">
    </span><span class="s">&quot;strconv&quot;</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="kd">type</span><span class="w"> </span><span class="nx">stringInterner</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">string</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">si</span><span class="w"> </span><span class="nx">stringInterner</span><span class="p">)</span><span class="w"> </span><span class="nx">Intern</span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nx">interned</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">si</span><span class="p">[</span><span class="nx">s</span><span class="p">];</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">return</span><span class="w"> </span><span class="nx">interned</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="nx">si</span><span class="p">[</span><span class="nx">s</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">s</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">si</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stringInterner</span><span class="p">{}</span><span class="w">
    </span><span class="nx">s1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">Intern</span><span class="p">(</span><span class="s">&quot;12&quot;</span><span class="p">)</span><span class="w">
    </span><span class="nx">s2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">Intern</span><span class="p">(</span><span class="nx">strconv</span><span class="p">.</span><span class="nx">Itoa</span><span class="p">(</span><span class="mi">12</span><span class="p">))</span><span class="w">
    </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">stringptr</span><span class="p">(</span><span class="nx">s1</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">stringptr</span><span class="p">(</span><span class="nx">s2</span><span class="p">))</span><span class="w"> </span><span class="c1">// true</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>Let's take a look at a few examples where interning could be useful.</p>
</section>
<section id="example-1-text-processing">
<h3>Example 1: Text processing<a class="headerlink" href="#example-1-text-processing" title="Permalink to this headline"> #</a></h3>
<p>Lexers, parsers and other text processing tools can greatly benefit from storing only distinct string values in memory.</p>
<p>The following program reads George Orwell's novel 1984 from a file and tokenizes it into a slice of words for further processing:</p>
<pre class="code go literal-block"><code><span class="kn">package</span><span class="w"> </span><span class="nx">main</span><span class="w">

</span><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
    </span><span class="s">&quot;bufio&quot;</span><span class="w">
    </span><span class="s">&quot;log&quot;</span><span class="w">
    </span><span class="s">&quot;os&quot;</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Open</span><span class="p">(</span><span class="s">&quot;1984.txt&quot;</span><span class="p">)</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">defer</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span><span class="w">

    </span><span class="nx">scanner</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bufio</span><span class="p">.</span><span class="nx">NewScanner</span><span class="p">(</span><span class="nx">f</span><span class="p">)</span><span class="w">
    </span><span class="nx">scanner</span><span class="p">.</span><span class="nx">Split</span><span class="p">(</span><span class="nx">bufio</span><span class="p">.</span><span class="nx">ScanWords</span><span class="p">)</span><span class="w">

    </span><span class="kd">var</span><span class="w"> </span><span class="nx">words</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nx">scanner</span><span class="p">.</span><span class="nx">Scan</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">words</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">words</span><span class="p">,</span><span class="w"> </span><span class="nx">scanner</span><span class="p">.</span><span class="nx">Text</span><span class="p">())</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">scanner</span><span class="p">.</span><span class="nx">Err</span><span class="p">();</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span><span class="w">
    </span><span class="p">}</span><span class="w">

    </span><span class="nx">log</span><span class="p">.</span><span class="nx">Print</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">words</span><span class="p">))</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>The book contains 103549 words, the length of all words combined is 483016 bytes. It's important to note that the number of unique words is much smaller - 15585.</p>
<p>If we take a look at a part of the <span class="docutils literal">words</span> slice we can see the same words (<span class="docutils literal">IS</span> in our example) having different addresses in memory:</p>
<pre class="code go literal-block"><code><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">words</span><span class="p">[</span><span class="mi">1111</span><span class="p">:</span><span class="mi">1120</span><span class="p">])</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">word</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">words</span><span class="p">[</span><span class="mi">1111</span><span class="p">:</span><span class="mi">1120</span><span class="p">]</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;%x &quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">stringptr</span><span class="p">(</span><span class="nx">word</span><span class="p">))</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="p">[</span><span class="nx">WAR</span><span class="w"> </span><span class="nx">IS</span><span class="w"> </span><span class="nx">PEACE</span><span class="w"> </span><span class="nx">FREEDOM</span><span class="w"> </span><span class="nx">IS</span><span class="w"> </span><span class="nx">SLAVERY</span><span class="w"> </span><span class="nx">IGNORANCE</span><span class="w"> </span><span class="nx">IS</span><span class="w"> </span><span class="nx">STRENGTH</span><span class="p">]</span><span class="w">
     </span><span class="p">^</span><span class="w">                </span><span class="p">^</span><span class="w">                    </span><span class="p">^</span><span class="w">

</span><span class="nx">c0000159fa</span><span class="w"> </span><span class="nx">c0000159fe</span><span class="w"> </span><span class="nx">c000015a00</span><span class="w"> </span><span class="nx">c000015a05</span><span class="w"> </span><span class="nx">c000015a0c</span><span class="w"> </span><span class="nx">c000015a10</span><span class="w"> </span><span class="nx">c000015a17</span><span class="w"> </span><span class="nx">c000015a20</span><span class="w"> </span><span class="nx">c000015a28</span><span class="w">
           </span><span class="p">^</span><span class="w">                                </span><span class="p">^</span><span class="w">                                </span><span class="p">^</span></code></pre>
<p>Let's modify the program to use string interning:</p>
<pre class="code go literal-block"><code><span class="kn">package</span><span class="w"> </span><span class="nx">main</span><span class="w">

</span><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
    </span><span class="s">&quot;bufio&quot;</span><span class="w">
    </span><span class="s">&quot;log&quot;</span><span class="w">
    </span><span class="s">&quot;os&quot;</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="kd">type</span><span class="w"> </span><span class="nx">stringInterner</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">string</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">si</span><span class="w"> </span><span class="nx">stringInterner</span><span class="p">)</span><span class="w"> </span><span class="nx">InternBytes</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nx">interned</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">si</span><span class="p">[</span><span class="nb">string</span><span class="p">(</span><span class="nx">b</span><span class="p">)];</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">return</span><span class="w"> </span><span class="nx">interned</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">b</span><span class="p">)</span><span class="w">
    </span><span class="nx">si</span><span class="p">[</span><span class="nx">s</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">s</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Open</span><span class="p">(</span><span class="s">&quot;1984.txt&quot;</span><span class="p">)</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">defer</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span><span class="w">

    </span><span class="nx">scanner</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bufio</span><span class="p">.</span><span class="nx">NewScanner</span><span class="p">(</span><span class="nx">f</span><span class="p">)</span><span class="w">
    </span><span class="nx">scanner</span><span class="p">.</span><span class="nx">Split</span><span class="p">(</span><span class="nx">bufio</span><span class="p">.</span><span class="nx">ScanWords</span><span class="p">)</span><span class="w">

    </span><span class="nx">si</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stringInterner</span><span class="p">{}</span><span class="w">
    </span><span class="kd">var</span><span class="w"> </span><span class="nx">words</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nx">scanner</span><span class="p">.</span><span class="nx">Scan</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">words</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">words</span><span class="p">,</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">InternBytes</span><span class="p">(</span><span class="nx">scanner</span><span class="p">.</span><span class="nx">Bytes</span><span class="p">()))</span><span class="w"> </span><span class="c1">// intern words</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">scanner</span><span class="p">.</span><span class="nx">Err</span><span class="p">();</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span><span class="w">
    </span><span class="p">}</span><span class="w">

    </span><span class="nx">log</span><span class="p">.</span><span class="nx">Print</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">words</span><span class="p">))</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>Now the words slice consists of strings that point to the string intern pool. All instances of <span class="docutils literal">IS</span> have the same address in memory:</p>
<pre class="code go literal-block"><code><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">words</span><span class="p">[</span><span class="mi">1111</span><span class="p">:</span><span class="mi">1120</span><span class="p">])</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">word</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">words</span><span class="p">[</span><span class="mi">1111</span><span class="p">:</span><span class="mi">1120</span><span class="p">]</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;%x &quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">stringptr</span><span class="p">(</span><span class="nx">word</span><span class="p">))</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="p">[</span><span class="nx">WAR</span><span class="w"> </span><span class="nx">IS</span><span class="w"> </span><span class="nx">PEACE</span><span class="w"> </span><span class="nx">FREEDOM</span><span class="w"> </span><span class="nx">IS</span><span class="w"> </span><span class="nx">SLAVERY</span><span class="w"> </span><span class="nx">IGNORANCE</span><span class="w"> </span><span class="nx">IS</span><span class="w"> </span><span class="nx">STRENGTH</span><span class="p">]</span><span class="w">
     </span><span class="p">^</span><span class="w">                </span><span class="p">^</span><span class="w">                    </span><span class="p">^</span><span class="w">

</span><span class="nx">c000015220</span><span class="w"> </span><span class="nx">c0000146c8</span><span class="w"> </span><span class="nx">c000015223</span><span class="w"> </span><span class="nx">c000015228</span><span class="w"> </span><span class="nx">c0000146c8</span><span class="w"> </span><span class="nx">c000015230</span><span class="w"> </span><span class="nx">c000015237</span><span class="w"> </span><span class="nx">c0000146c8</span><span class="w"> </span><span class="nx">c000015240</span><span class="w">
           </span><span class="p">^</span><span class="w">                                </span><span class="p">^</span><span class="w">                                </span><span class="p">^</span></code></pre>
<p>The amount of memory required to store the words decreased from 483016 to 119628 bytes which is more than 4x reduction.</p>
<p>Moreover, the []byte map key <a class="reference external" href="https://github.com/golang/go/issues/3512" target="_blank">optimization</a> helps to avoid expensive heap allocations. The Go compiler recognizes <span class="docutils literal">si[string(b)]</span> and performs the lookup operation without allocating a new string and producing garbage.</p>
</section>
<section id="example-2-network-services">
<h3>Example 2: Network services<a class="headerlink" href="#example-2-network-services" title="Permalink to this headline"> #</a></h3>
<p>Another example where string interning can be useful is network services. Interning can be applied to some string responses from databases, gRPC or HTTP servers.</p>
<p>The following snippet caches user information after querying it from a database:</p>
<pre class="code go literal-block"><code><span class="kd">type</span><span class="w"> </span><span class="nx">user</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">ID</span><span class="w">      </span><span class="kt">int</span><span class="w">
    </span><span class="nx">Country</span><span class="w"> </span><span class="kt">string</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">type</span><span class="w"> </span><span class="nx">userService</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">db</span><span class="w">    </span><span class="o">*</span><span class="nx">sql</span><span class="p">.</span><span class="nx">DB</span><span class="w">
    </span><span class="nx">cache</span><span class="w"> </span><span class="nx">sync</span><span class="p">.</span><span class="nx">Map</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">us</span><span class="w"> </span><span class="o">*</span><span class="nx">userService</span><span class="p">)</span><span class="w"> </span><span class="nx">get</span><span class="p">(</span><span class="nx">id</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">user</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nx">u</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">us</span><span class="p">.</span><span class="nx">cache</span><span class="p">.</span><span class="nx">Load</span><span class="p">(</span><span class="nx">id</span><span class="p">);</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">return</span><span class="w"> </span><span class="nx">u</span><span class="p">.(</span><span class="o">*</span><span class="nx">user</span><span class="p">),</span><span class="w"> </span><span class="kc">nil</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="nx">u</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">user</span><span class="p">{}</span><span class="w">
    </span><span class="nx">row</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">us</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">QueryRow</span><span class="p">(</span><span class="s">&quot;SELECT id, country FROM users WHERE id=?&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">id</span><span class="p">)</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">row</span><span class="p">.</span><span class="nx">Scan</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">u</span><span class="p">.</span><span class="nx">ID</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">u</span><span class="p">.</span><span class="nx">Country</span><span class="p">);</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="nx">us</span><span class="p">.</span><span class="nx">cache</span><span class="p">.</span><span class="nx">Store</span><span class="p">(</span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="nx">u</span><span class="p">)</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">u</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>There are about 200 countries as of December 2018, so storing only a single copy of each string can save us memory if the number of users we want to keep in cache is large.</p>
<p>The user service is called from an HTTP handler, which means the intern pool is required to be safe for concurrent access from multiple goroutines. Luckily Go 1.9 introduced a concurrent map - <a class="reference external" href="https://golang.org/pkg/sync/#Map" target="_blank">sync.Map</a>.</p>
<p>The user service using a thread-safe version of <span class="docutils literal">stringInterner</span>:</p>
<pre class="code go literal-block"><code><span class="kd">type</span><span class="w"> </span><span class="nx">stringInterner</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">sync</span><span class="p">.</span><span class="nx">Map</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">si</span><span class="w"> </span><span class="o">*</span><span class="nx">stringInterner</span><span class="p">)</span><span class="w"> </span><span class="nx">Intern</span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">interned</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">LoadOrStore</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">)</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">interned</span><span class="p">.(</span><span class="kt">string</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">type</span><span class="w"> </span><span class="nx">userService</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">db</span><span class="w">    </span><span class="o">*</span><span class="nx">sql</span><span class="p">.</span><span class="nx">DB</span><span class="w">
    </span><span class="nx">cache</span><span class="w"> </span><span class="nx">sync</span><span class="p">.</span><span class="nx">Map</span><span class="w">
    </span><span class="nx">si</span><span class="w">    </span><span class="nx">stringInterner</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">us</span><span class="w"> </span><span class="o">*</span><span class="nx">userService</span><span class="p">)</span><span class="w"> </span><span class="nx">get</span><span class="p">(</span><span class="nx">id</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">user</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nx">u</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">us</span><span class="p">.</span><span class="nx">cache</span><span class="p">.</span><span class="nx">Load</span><span class="p">(</span><span class="nx">id</span><span class="p">);</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">return</span><span class="w"> </span><span class="nx">u</span><span class="p">.(</span><span class="o">*</span><span class="nx">user</span><span class="p">),</span><span class="w"> </span><span class="kc">nil</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="nx">u</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">user</span><span class="p">{}</span><span class="w">
    </span><span class="nx">row</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">us</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">QueryRow</span><span class="p">(</span><span class="s">&quot;SELECT id, country FROM users WHERE id=?&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">id</span><span class="p">)</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">row</span><span class="p">.</span><span class="nx">Scan</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">u</span><span class="p">.</span><span class="nx">ID</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">u</span><span class="p">.</span><span class="nx">Country</span><span class="p">);</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="nx">u</span><span class="p">.</span><span class="nx">Country</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">us</span><span class="p">.</span><span class="nx">si</span><span class="p">.</span><span class="nx">Intern</span><span class="p">(</span><span class="nx">u</span><span class="p">.</span><span class="nx">Country</span><span class="p">)</span><span class="w"> </span><span class="c1">// intern country</span><span class="w">
    </span><span class="nx">us</span><span class="p">.</span><span class="nx">cache</span><span class="p">.</span><span class="nx">Store</span><span class="p">(</span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="nx">u</span><span class="p">)</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">u</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="w">
</span><span class="p">}</span></code></pre>
<aside class="admonition note">
<p class="admonition-title">Note</p>
<p>Keep in mind the intern pool only grows and never shrinks in the implementation above. One way to solve that is by having two maps in the pool (<span class="docutils literal">current</span> and <span class="docutils literal">previous</span>) and rotating them once in a while.</p>
</aside>
</section>
<section id="usage-in-go-standard-library">
<h3>Usage in Go standard library<a class="headerlink" href="#usage-in-go-standard-library" title="Permalink to this headline"> #</a></h3>
<p>The Go standard library uses interning in a few places, one of them is <a class="reference external" href="https://github.com/golang/go/blob/f5b695030b857b079a4cbcfb79564ff933c0c8f2/src/net/textproto/reader.go#L644" target="_blank">net/textproto</a>. <span class="docutils literal">textproto.Reader</span> interns common HTTP headers (<span class="docutils literal"><span class="pre">Content-Type</span></span>, <span class="docutils literal">Host</span>, <span class="docutils literal"><span class="pre">User-Agent</span></span>, etc.) to avoid memory allocations.</p>
</section>
<section id="string-comparison">
<h3>String comparison<a class="headerlink" href="#string-comparison" title="Permalink to this headline"> #</a></h3>
<p>A decrease in memory usage is not the only advantage of string interning. Interned strings can be compared for equality in constant time. All the compiler needs to do is to check if two pointers are equal (<a class="reference external" href="https://github.com/golang/go/blob/ad4a58e31501bce5de2aad90a620eaecdc1eecb8/src/internal/bytealg/compare_amd64.s#L30" target="_blank">Go source code</a>) instead of going through the characters:</p>
<pre class="code literal-block"><code>TEXT cmpbody&lt;&gt;(SB),NOSPLIT,$0-0
    CMPQ    SI, DI
    JEQ allsame</code></pre>
<p>I created a benchmark to see the performance difference between interned and non-interned strings:</p>
<pre class="code go literal-block"><code><span class="kn">package</span><span class="w"> </span><span class="nx">main</span><span class="w">

</span><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
    </span><span class="s">&quot;strings&quot;</span><span class="w">
    </span><span class="s">&quot;testing&quot;</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">benchmarkStringCompare</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">B</span><span class="p">,</span><span class="w"> </span><span class="nx">count</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">s1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Repeat</span><span class="p">(</span><span class="s">&quot;a&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">count</span><span class="p">)</span><span class="w">
    </span><span class="nx">s2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Repeat</span><span class="p">(</span><span class="s">&quot;a&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">count</span><span class="p">)</span><span class="w">
    </span><span class="nx">b</span><span class="p">.</span><span class="nx">ResetTimer</span><span class="p">()</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">N</span><span class="p">;</span><span class="w"> </span><span class="nx">n</span><span class="o">++</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">if</span><span class="w"> </span><span class="nx">s1</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">s2</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="nx">b</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">()</span><span class="w">
        </span><span class="p">}</span><span class="w">
    </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">benchmarkStringCompareIntern</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">B</span><span class="p">,</span><span class="w"> </span><span class="nx">count</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">si</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stringInterner</span><span class="p">{}</span><span class="w">
    </span><span class="nx">s1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">Intern</span><span class="p">(</span><span class="nx">strings</span><span class="p">.</span><span class="nx">Repeat</span><span class="p">(</span><span class="s">&quot;a&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">count</span><span class="p">))</span><span class="w">
    </span><span class="nx">s2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">Intern</span><span class="p">(</span><span class="nx">strings</span><span class="p">.</span><span class="nx">Repeat</span><span class="p">(</span><span class="s">&quot;a&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">count</span><span class="p">))</span><span class="w">
    </span><span class="nx">b</span><span class="p">.</span><span class="nx">ResetTimer</span><span class="p">()</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">N</span><span class="p">;</span><span class="w"> </span><span class="nx">n</span><span class="o">++</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">if</span><span class="w"> </span><span class="nx">s1</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">s2</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="nx">b</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">()</span><span class="w">
        </span><span class="p">}</span><span class="w">
    </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">BenchmarkStringCompare1</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">B</span><span class="p">)</span><span class="w">   </span><span class="p">{</span><span class="w"> </span><span class="nx">benchmarkStringCompare</span><span class="p">(</span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="kd">func</span><span class="w"> </span><span class="nx">BenchmarkStringCompare10</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">B</span><span class="p">)</span><span class="w">  </span><span class="p">{</span><span class="w"> </span><span class="nx">benchmarkStringCompare</span><span class="p">(</span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="kd">func</span><span class="w"> </span><span class="nx">BenchmarkStringCompare100</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">B</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">benchmarkStringCompare</span><span class="p">(</span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">BenchmarkStringCompareIntern1</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">B</span><span class="p">)</span><span class="w">   </span><span class="p">{</span><span class="w"> </span><span class="nx">benchmarkStringCompareIntern</span><span class="p">(</span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="kd">func</span><span class="w"> </span><span class="nx">BenchmarkStringCompareIntern10</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">B</span><span class="p">)</span><span class="w">  </span><span class="p">{</span><span class="w"> </span><span class="nx">benchmarkStringCompareIntern</span><span class="p">(</span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="kd">func</span><span class="w"> </span><span class="nx">BenchmarkStringCompareIntern100</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">B</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">benchmarkStringCompareIntern</span><span class="p">(</span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">)</span><span class="w"> </span><span class="p">}</span></code></pre>
<p>The speed of comparison of interned strings remains constant independently of the number of characters:</p>
<pre class="code go literal-block"><code><span class="nx">BenchmarkStringCompare1</span><span class="o">-</span><span class="mi">4</span><span class="w">               </span><span class="mi">500000000</span><span class="w">            </span><span class="mf">2.93</span><span class="w"> </span><span class="nx">ns</span><span class="o">/</span><span class="nx">op</span><span class="w">
</span><span class="nx">BenchmarkStringCompare10</span><span class="o">-</span><span class="mi">4</span><span class="w">              </span><span class="mi">300000000</span><span class="w">            </span><span class="mf">6.21</span><span class="w"> </span><span class="nx">ns</span><span class="o">/</span><span class="nx">op</span><span class="w">
</span><span class="nx">BenchmarkStringCompare100</span><span class="o">-</span><span class="mi">4</span><span class="w">             </span><span class="mi">100000000</span><span class="w">            </span><span class="mf">13.2</span><span class="w"> </span><span class="nx">ns</span><span class="o">/</span><span class="nx">op</span><span class="w">
</span><span class="nx">BenchmarkStringCompareIntern1</span><span class="o">-</span><span class="mi">4</span><span class="w">         </span><span class="mi">1000000000</span><span class="w">           </span><span class="mf">2.60</span><span class="w"> </span><span class="nx">ns</span><span class="o">/</span><span class="nx">op</span><span class="w">
</span><span class="nx">BenchmarkStringCompareIntern10</span><span class="o">-</span><span class="mi">4</span><span class="w">        </span><span class="mi">1000000000</span><span class="w">           </span><span class="mf">2.60</span><span class="w"> </span><span class="nx">ns</span><span class="o">/</span><span class="nx">op</span><span class="w">
</span><span class="nx">BenchmarkStringCompareIntern100</span><span class="o">-</span><span class="mi">4</span><span class="w">       </span><span class="mi">1000000000</span><span class="w">           </span><span class="mf">2.60</span><span class="w"> </span><span class="nx">ns</span><span class="o">/</span><span class="nx">op</span></code></pre>
</section>
<section id="conclusion">
<h3>Conclusion<a class="headerlink" href="#conclusion" title="Permalink to this headline"> #</a></h3>
<p>String interning can save memory at a cost of CPU time - every lookup from the intern pool requires hashing the input string (which may not work for CPU bound applications).</p>
<p>Here is a real-life example - memory usage of the application went down from 70% to 45% after I deployed a new version with string interning enabled:</p>
<img alt="" src="https://artem.krylysov.com/images/2018-string-interning/container-mem.png" style="width: 604px;" />
</section>
]]>
        </content>
    </entry>
    
    <entry>
        <title><![CDATA[Pogreb - key-value store for read-heavy workloads]]></title>
        <link href="https://artem.krylysov.com/blog/2018/03/24/pogreb-key-value-store/"/>
        <updated>2018-03-24T00:00:00Z</updated>
        <id>https://artem.krylysov.com/blog/2018/03/24/pogreb-key-value-store/</id>
        <content type="html">
            <![CDATA[<aside class="admonition note">
<p class="admonition-title">Note</p>
<p>This post is outdated, please read the new design document on <a class="reference external" href="https://github.com/akrylysov/pogreb/blob/master/docs/design.md" target="_blank">GitHub</a>.</p>
</aside>
<p>A few months ago I released the first version of an embedded on-disk key-value store written in Go. The store is about 10 times faster than LevelDB for random lookups. I'll explain why it's faster, but first let's talk about the reason why I decided to create my own key-value store.</p>
<img alt="" class="align-center" src="https://akrylysov.github.io/pogreb/logo.svg" style="width: 300px;" />
<section id="why-another-key-value-store">
<h3>Why another key-value store?<a class="headerlink" href="#why-another-key-value-store" title="Permalink to this headline"> #</a></h3>
<p>I needed a store that could map keys to values with the following requirements:</p>
<ul class="simple">
<li><p>The number of keys was large and I couldn't keep the items in memory (that would require switching to more expensive server instance type).</p></li>
<li><p>Low latency. I wanted to avoid network overhead if it was possible.</p></li>
<li><p>I needed to rebuild the mapping once a day and then access it in read-only mode.</p></li>
<li><p>The sequential lookup performance wasn't important, all I cared about was random lookup performance.</p></li>
</ul>
<p>There are plenty of open-source key-value stores - the most popular are <a class="reference external" href="https://github.com/google/leveldb" target="_blank">LevelDB</a>, <a class="reference external" href="https://github.com/facebook/rocksdb" target="_blank">RocksDB</a> and <a class="reference external" href="http://www.lmdb.tech/doc/" target="_blank">LMDB</a>. Also there are some key-value stores implemented in Go - e.g. <a class="reference external" href="https://github.com/boltdb/bolt/" target="_blank">Bolt</a>, <a class="reference external" href="https://github.com/syndtr/goleveldb/" target="_blank">goleveldb</a> (port of LevelDB) and <a class="reference external" href="https://github.com/dgraph-io/badger" target="_blank">BadgerDB</a>.</p>
<p>I tried to use LevelDB and RocksDB in production, but unfortunately, the observed results didn't meet the performance requirements.</p>
<p>The stores I mentioned above are not optimized for a specific use case - they are general-purpose solutions. For example, LevelDB supports range queries - allows iterating over keys in lexicographic order, supports caching, bloom filters and many other nice features, but I didn't need any of those features. The common implementation of key-value stores is based on tree algorithms. Usually it's B+ trees or LSM trees. Both data structures always require several I/O operations per lookup.</p>
<p>A hash table with a constant average lookup time seemed like a better choice. At least in theory a hash table beats a tree for my use case, so I decided to implement a simple on-disk hash table and benchmark it against LevelDB.</p>
</section>
<section id="file-layout">
<h3>File layout<a class="headerlink" href="#file-layout" title="Permalink to this headline"> #</a></h3>
<p>Pogreb uses two files to store keys and values.</p>
<section id="index-file">
<h4>Index file<a class="headerlink" href="#index-file" title="Permalink to this headline"> #</a></h4>
<p>The index file holds the header followed by an array of buckets.</p>
<pre class="code literal-block"><code>+----------+
| header   |
+----------+
| bucket 0 |
| ...      |
| bucket N |
+----------+</code></pre>
<section id="bucket">
<h5>Bucket<a class="headerlink" href="#bucket" title="Permalink to this headline"> #</a></h5>
<p>Each bucket is an array of slots plus an optional file pointer to an overflow bucket. The number of slots in a bucket is hardcoded to 28 - that is the maximum number of slots that is possible to fit in 512 bytes.</p>
<pre class="code literal-block"><code>+------------------------+
| slot 0                 |
| ...                    |
| slot N                 |
+------------------------+
| overflow bucket offset |
+------------------------+</code></pre>
</section>
<section id="slot">
<h5>Slot<a class="headerlink" href="#slot" title="Permalink to this headline"> #</a></h5>
<p>A slot contains the hash, the size of the key, the value size and a 64-bit offset of the key/value pair in the data file.</p>
<pre class="code literal-block"><code>+------------------+
| key hash         |
+------------------+
| key size         |
+------------------+
| value size       |
+------------------+
| key/value offset |
+------------------+</code></pre>
</section>
</section>
<section id="data-file">
<h4>Data file<a class="headerlink" href="#data-file" title="Permalink to this headline"> #</a></h4>
<p>The data file stores keys, values, and overflow buckets.</p>
</section>
</section>
<section id="algorithm">
<h3>Algorithm<a class="headerlink" href="#algorithm" title="Permalink to this headline"> #</a></h3>
<p>Pogreb uses the <a class="reference external" href="https://en.wikipedia.org/wiki/Linear_hashing" target="_blank">Linear hashing</a> algorithm which grows the index file one bucket at a time instead of rebuilding the whole hash table.</p>
<p>Initially, the hash table contains a single bucket (<em>N=1</em>).</p>
<p>Level <em>L</em> (initially <em>L=0</em>) represents the maximum number of buckets on a logarithmic scale the hash table can have. For example, a hash table with <em>L=0</em> contains between 0 and 1 buckets; <em>L=3</em> contains between 4 and 8 buckets.</p>
<p><em>S</em> is the index of the &quot;split&quot; bucket (initially <em>S=0</em>).</p>
<p>Collisions are resolved using bucket chaining method. Overflow buckets are stored in the data file and form a linked list.</p>
<section id="lookup">
<h4>Lookup<a class="headerlink" href="#lookup" title="Permalink to this headline"> #</a></h4>
<p>Position of the bucket in the index file is calculated by applying a hash function to the key:</p>
<pre class="code literal-block"><code>           Index file
          +----------+
          | bucket 0 |                 Bucket
          | ...      |    +------------------------------+
h(key) -&gt; | bucket X | -&gt; | slot 0 ... slot Y ... slot N |
          | ...      |    +------------------------------+
          | bucket N |                    |
          +----------+                    |
                                          |
                                          v
                                      Data file
                                    +-----------+
                                    | key-value |
                                    +-----------+</code></pre>
<p>To get the position of the bucket:</p>
<ul class="simple">
<li><p>Hash the key (Pogreb uses the 32-bit version of MurmurHash3).</p></li>
<li><p>Use 2<sup>L</sup> bits of the hash to get the position of the bucket - <span class="docutils literal">hash % math.Pow(2, L)</span>.</p></li>
<li><p>If the calculated position comes before the split bucket <em>S</em>, the position is <span class="docutils literal">hash % math.Pow(2, L+1)</span>.</p></li>
</ul>
<p>The lookup function reads a bucket at the given position from the index file and performs a linear search to find a slot with the required hash. If the bucket doesn't contain a slot with the required hash, but the pointer to the overflow bucket is non-zero, the overflow bucket is checked. The process continues until the required slot is found or until there is no more overflow buckets for the given key. Once a slot with the required key is found,
pogreb reads the key/value pair from the data file.</p>
<p>The average lookup requires two I/O operations - one is to find a slot in the index file and another one is to read the key and value from the data file.</p>
</section>
<section id="insertion">
<h4>Insertion<a class="headerlink" href="#insertion" title="Permalink to this headline"> #</a></h4>
<p>Insertion is performed by adding the key/value pair to the data file and updating a bucket in the index file. If the bucket has all of its slots occupied, a new overflow bucket is created in the data file.</p>
</section>
<section id="split">
<h4>Split<a class="headerlink" href="#split" title="Permalink to this headline"> #</a></h4>
<p>When the number of items in the hash table exceeds the load factor threshold (70%), the split operation is performed on the split bucket <em>S</em>:</p>
<ul class="simple">
<li><p>A new bucket is allocated at the end of the index file.</p></li>
<li><p>The split bucket index <em>S</em> is incremented.</p></li>
<li><p>If <em>S</em> points to 2<sup>L</sup>, <em>S</em> is reset to 0 and <em>L</em> is incremented.</p></li>
<li><p>The items from the old split bucket are separated between the newly allocated bucket and the old split bucket by recalculating the positions of the keys in the hash table.</p></li>
<li><p>The number of buckets <em>N</em> is incremented.</p></li>
</ul>
</section>
<section id="removal">
<h4>Removal<a class="headerlink" href="#removal" title="Permalink to this headline"> #</a></h4>
<p>The removal operation lookups the bucket by key, removes the slot from the bucket, overwrites the bucket and then adds the offset of the key/value pair to the free list.</p>
</section>
<section id="free-list">
<h4>Free list<a class="headerlink" href="#free-list" title="Permalink to this headline"> #</a></h4>
<p>Pogreb maintains a list of blocks freed from the data file. The insertion operation tries to find a free block in the free list first before extending the data file. The free list implementation uses the &quot;best-fit&quot; algorithm for allocations. The free list can perform basic defragmentation by merging the adjacent free blocks.</p>
</section>
<section id="choosing-the-right-load-factor">
<h4>Choosing the right load factor<a class="headerlink" href="#choosing-the-right-load-factor" title="Permalink to this headline"> #</a></h4>
<p>The load factor is the ratio of the number of current items in the hash table to the number of total slots available. The load factor determines the point when the hash table should grow the index file. If the load factor is too large it can lead to a big number of overflow buckets. At the same time, a small load factor wastes a lot of space in the index file, which can also be a bad thing for the hash table performance because it leads to a poor caching.</p>
<p>For example, Python's <span class="docutils literal">dict</span> load factor equals to ~66%, Java's <span class="docutils literal">HashMap</span> uses 75% by default, in Go's <span class="docutils literal">map</span> it's 65%.</p>
<p>Here is the benchmark results for different load factor values:</p>
<table>
<colgroup>
<col style="width: 33.3%" />
<col style="width: 33.3%" />
<col style="width: 33.3%" />
</colgroup>
<thead>
<tr><th class="head"><p>load factor</p></th>
<th class="head"><p>reads per second</p></th>
<th class="head"><p>index size (MB)</p></th>
</tr>
</thead>
<tbody>
<tr><td><p>0.4</p></td>
<td><p>909488</p></td>
<td><p>45</p></td>
</tr>
<tr><td><p>0.5</p></td>
<td><p>908071</p></td>
<td><p>35</p></td>
</tr>
<tr><td><p>0.6</p></td>
<td><p>909977</p></td>
<td><p>30</p></td>
</tr>
<tr><td><p>0.65</p></td>
<td><p>921098</p></td>
<td><p>27.5</p></td>
</tr>
<tr><td><p>0.7</p></td>
<td><p><strong>921432</strong></p></td>
<td><p>25</p></td>
</tr>
<tr><td><p>0.75</p></td>
<td><p>910038</p></td>
<td><p>23.5</p></td>
</tr>
<tr><td><p>0.8</p></td>
<td><p>862662</p></td>
<td><p>22</p></td>
</tr>
<tr><td><p>0.9</p></td>
<td><p>731267</p></td>
<td><p>19.5</p></td>
</tr>
</tbody>
</table>
<p><span class="docutils literal">0.7</span> is a clear winner here.</p>
</section>
</section>
<section id="memory-mapped-file">
<h3>Memory-mapped file<a class="headerlink" href="#memory-mapped-file" title="Permalink to this headline"> #</a></h3>
<p>Both index and data files are stored entirely on disk. The files are mapped into the process's virtual memory space using the <span class="docutils literal">mmap</span> syscall on Unix-like operating systems or using the <span class="docutils literal">CreateFileMapping</span> WinAPI function on Windows.</p>
<p>Switching from a regular file I/O to mmap for lookup operations gave nearly 100% performance boost.</p>
</section>
<section id="durability">
<h3>Durability<a class="headerlink" href="#durability" title="Permalink to this headline"> #</a></h3>
<p>Pogreb makes an important assumption - it assumes that all 512-byte sector disk writes are atomic. Each bucket is precisely 512 bytes and all allocations in the data file are 512-byte aligned. The insertion operation doesn't overwrite the old data in the data file - it writes a new key/value pair first, then updates the bucket in-place in the index file. If both operations are successful, the space allocated by the old key/value is added to the free list.</p>
<p>Pogreb doesn't flush changes to disk (fsync operation) automatically. Data that is not flushed may be lost in case of power loss. However, application crash shouldn't cause any issues. There is an option to make Pogreb fsync changes to disk after each write - set <span class="docutils literal">BackgroundSyncInterval</span> to <span class="docutils literal"><span class="pre">-1</span></span>. Alternatively <span class="docutils literal">DB.Sync</span> can be used at any time to force the OS to write the data to disk.</p>
</section>
<section id="concurrency">
<h3>Concurrency<a class="headerlink" href="#concurrency" title="Permalink to this headline"> #</a></h3>
<p>Pogreb supports multiple concurrent readers and a single writer - write operations are guarded with a mutex.</p>
</section>
<section id="api">
<h3>API<a class="headerlink" href="#api" title="Permalink to this headline"> #</a></h3>
<p>The API is similar to the LevelDB API - it provides methods like <span class="docutils literal">Put</span>, <span class="docutils literal">Get</span>, <span class="docutils literal">Has</span>, <span class="docutils literal">Delete</span> and supports iteration over the key/value pairs (the <span class="docutils literal">Items</span> method):</p>
<pre class="code go literal-block"><code><span class="kn">package</span><span class="w"> </span><span class="nx">main</span><span class="w">

</span><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
    </span><span class="s">&quot;log&quot;</span><span class="w">

    </span><span class="s">&quot;github.com/akrylysov/pogreb&quot;</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="c1">// Opening a database.</span><span class="w">
    </span><span class="nx">db</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pogreb</span><span class="p">.</span><span class="nx">Open</span><span class="p">(</span><span class="s">&quot;example&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span><span class="w">
        </span><span class="k">return</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">defer</span><span class="w"> </span><span class="nx">db</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span><span class="w">

    </span><span class="c1">// Writing to a database.</span><span class="w">
    </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">db</span><span class="p">.</span><span class="nx">Put</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="s">&quot;foo&quot;</span><span class="p">),</span><span class="w"> </span><span class="p">[]</span><span class="nb">byte</span><span class="p">(</span><span class="s">&quot;bar&quot;</span><span class="p">))</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span><span class="w">
        </span><span class="k">return</span><span class="w">
    </span><span class="p">}</span><span class="w">

    </span><span class="c1">// Reading from a database.</span><span class="w">
    </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">db</span><span class="p">.</span><span class="nx">Get</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="s">&quot;foo&quot;</span><span class="p">))</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span><span class="w">
        </span><span class="k">return</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;%s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="p">)</span><span class="w">

    </span><span class="c1">// Iterating over key/value pairs.</span><span class="w">
    </span><span class="nx">it</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">db</span><span class="p">.</span><span class="nx">Items</span><span class="p">()</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">it</span><span class="p">.</span><span class="nx">Next</span><span class="p">()</span><span class="w">
        </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">pogreb</span><span class="p">.</span><span class="nx">ErrIterationDone</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="k">break</span><span class="w">
        </span><span class="p">}</span><span class="w">
        </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span><span class="w">
        </span><span class="p">}</span><span class="w">
        </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;%s %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="p">)</span><span class="w">
    </span><span class="p">}</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>Unlike LevelDB's iterator, Pogreb's iterator returns key/value pairs in an unspecified order - you get constant time lookups in exchange for losing range and prefix scans.</p>
<p>All API methods are safe for concurrent use by multiple goroutines.</p>
</section>
<section id="performance">
<h3>Performance<a class="headerlink" href="#performance" title="Permalink to this headline"> #</a></h3>
<p>I created a <a class="reference external" href="https://github.com/akrylysov/pogreb-bench" target="_blank">tool</a> to benchmark Pogreb, Bolt, goleveldb and BadgerDB. The tool generates a given number of random keys (16-64 byte length) and writes random values (128-512 byte length) for the generated keys. After successfully writing the items, pogreb-bench reopens the database, shuffles the keys and reads them back.</p>
<p>The benchmarks were performed on a DigitalOcean droplet with 8 CPUs / 16 GB RAM / 160 GB SSD + Ubuntu 16.04.3:</p>
<img alt="" class="align-center" src="https://akrylysov.github.io/pogreb/read-bench.png" style="width: 609px; height: 454px;" />
</section>
<section id="conclusion">
<h3>Conclusion<a class="headerlink" href="#conclusion" title="Permalink to this headline"> #</a></h3>
<p>Choosing the right data structure and optimizing the store for the specific use case (random lookups) allowed to make Pogreb significantly faster than similar solutions.</p>
<p>Pogreb on Github - <a class="reference external" href="https://github.com/akrylysov/pogreb" target="_blank">https://github.com/akrylysov/pogreb</a>.</p>
</section>
]]>
        </content>
    </entry>
    
    <entry>
        <title><![CDATA[Porting Go web applications to AWS Lambda]]></title>
        <link href="https://artem.krylysov.com/blog/2018/01/18/porting-go-web-applications-to-aws-lambda/"/>
        <updated>2018-01-18T00:00:00Z</updated>
        <id>https://artem.krylysov.com/blog/2018/01/18/porting-go-web-applications-to-aws-lambda/</id>
        <content type="html">
            <![CDATA[<p>Running Go on AWS Lambda is not something totally new - developers figured out how to launch Go binaries from Python a while ago, but it wasn't convenient and had some performance implications.</p>
<p>A few days ago Amazon <a class="reference external" href="https://aws.amazon.com/blogs/compute/announcing-go-support-for-aws-lambda/" target="_blank">announced</a> an official Go support for AWS Lambda.</p>
<p>The API Gateway integration is straightforward, all you need to do is to import the <span class="docutils literal"><span class="pre">github.com/aws/aws-lambda-go</span></span> package, implement a Lambda handler and call the <span class="docutils literal">lambda.Start</span> to register the handler:</p>
<pre class="code go literal-block"><code><span class="kn">package</span><span class="w"> </span><span class="nx">main</span><span class="w">

</span><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
    </span><span class="s">&quot;github.com/aws/aws-lambda-go/events&quot;</span><span class="w">
    </span><span class="s">&quot;github.com/aws/aws-lambda-go/lambda&quot;</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">lambdaHandler</span><span class="p">(</span><span class="nx">event</span><span class="w"> </span><span class="nx">events</span><span class="p">.</span><span class="nx">APIGatewayProxyRequest</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nx">events</span><span class="p">.</span><span class="nx">APIGatewayProxyResponse</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">events</span><span class="p">.</span><span class="nx">APIGatewayProxyResponse</span><span class="p">{</span><span class="nx">Body</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;hi&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">StatusCode</span><span class="p">:</span><span class="w"> </span><span class="mi">200</span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">lambda</span><span class="p">.</span><span class="nx">Start</span><span class="p">(</span><span class="nx">lambdaHandler</span><span class="p">)</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>I had a small service I wanted to port to Lambda:</p>
<pre class="code go literal-block"><code><span class="c1">// Very very simplified version of the service.</span><span class="w">
</span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span><span class="w">

</span><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
    </span><span class="s">&quot;fmt&quot;</span><span class="w">
    </span><span class="s">&quot;net/http&quot;</span><span class="w">
    </span><span class="s">&quot;strconv&quot;</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">indexHandler</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">w</span><span class="p">.</span><span class="nx">Write</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="s">&quot;index&quot;</span><span class="p">))</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">addHandler</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">Atoi</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">FormValue</span><span class="p">(</span><span class="s">&quot;first&quot;</span><span class="p">))</span><span class="w">
    </span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">Atoi</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">FormValue</span><span class="p">(</span><span class="s">&quot;second&quot;</span><span class="p">))</span><span class="w">
    </span><span class="nx">w</span><span class="p">.</span><span class="nx">Header</span><span class="p">().</span><span class="nx">Set</span><span class="p">(</span><span class="s">&quot;X-Hi&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;foo&quot;</span><span class="p">)</span><span class="w">
    </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Fprintf</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;%d&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">f</span><span class="o">+</span><span class="nx">s</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">http</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">&quot;/&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">indexHandler</span><span class="p">)</span><span class="w">
    </span><span class="nx">http</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">&quot;/add&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">addHandler</span><span class="p">)</span><span class="w">
    </span><span class="nx">http</span><span class="p">.</span><span class="nx">ListenAndServe</span><span class="p">(</span><span class="s">&quot;:8080&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>I didn't want to rewrite any of my HTTP handlers. When I looked at the code, the first idea was that it would be nice to make the Lambda runtime support the existing <span class="docutils literal">net/http</span> handlers.</p>
<p>The <span class="docutils literal">net/http</span> server consists of two main parts:</p>
<ol class="arabic simple">
<li><p>The HTTP request multiplexer <a class="reference external" href="https://golang.org/pkg/net/http/#ServeMux" target="_blank">ServeMux</a> (<a class="reference external" href="https://golang.org/pkg/net/http/#Handler" target="_blank">Handler</a> interface) which routes all incoming requests.</p></li>
<li><p>The HTTP <a class="reference external" href="https://golang.org/pkg/net/http/#Server" target="_blank">Server</a> itself that handles the TCP connections.</p></li>
</ol>
<p>You can replace any of these parts, e.g. you can plug in one of the custom HTTP routers (<a class="reference external" href="https://github.com/go-chi/chi" target="_blank">chi</a>, <a class="reference external" href="https://github.com/gorilla/mux" target="_blank">gorilla mux</a>, <a class="reference external" href="https://github.com/julienschmidt/httprouter" target="_blank">httprouter</a>) that provide more features and/or better performance compared to <span class="docutils literal">ServeMux</span>. Similarly, you can create your own <span class="docutils literal">Server</span> implementation.</p>
<p>This flexibility makes many things easier, e.g. unit testing:</p>
<pre class="code go literal-block"><code><span class="kn">package</span><span class="w"> </span><span class="nx">main</span><span class="w">

</span><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
    </span><span class="s">&quot;net/http&quot;</span><span class="w">
    </span><span class="s">&quot;net/http/httptest&quot;</span><span class="w">
    </span><span class="s">&quot;testing&quot;</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">TestIndexHandler</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">http</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">&quot;/&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">indexHandler</span><span class="p">)</span><span class="w">
    </span><span class="nx">r</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">httptest</span><span class="p">.</span><span class="nx">NewRequest</span><span class="p">(</span><span class="s">&quot;GET&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;/&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span><span class="w">
    </span><span class="nx">w</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">httptest</span><span class="p">.</span><span class="nx">NewRecorder</span><span class="p">()</span><span class="w">
    </span><span class="nx">http</span><span class="p">.</span><span class="nx">DefaultServeMux</span><span class="p">.</span><span class="nx">ServeHTTP</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">)</span><span class="w"> </span><span class="c1">// Note: you can directly call indexHandler here without involving ServeMux.</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nx">w</span><span class="p">.</span><span class="nx">Code</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusOK</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">w</span><span class="p">.</span><span class="nx">Body</span><span class="p">.</span><span class="nx">String</span><span class="p">()</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s">&quot;index&quot;</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">t</span><span class="p">.</span><span class="nx">Fail</span><span class="p">()</span><span class="w">
    </span><span class="p">}</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>As you may have noticed, the Go standard library provides everything to construct <span class="docutils literal">http.ResponseWriter</span> and <span class="docutils literal">http.Request</span> objects. In order to make the Lambda SDK support <span class="docutils literal">net/http</span> handlers we need to convert <span class="docutils literal">APIGatewayProxyRequest</span> into <span class="docutils literal">http.Request</span>, create a mock <span class="docutils literal">http.ResponseWriter</span>, call the router's <span class="docutils literal">ServeHTTP</span> and return a new <span class="docutils literal">APIGatewayProxyResponse</span>. A basic implementation:</p>
<pre class="code go literal-block"><code><span class="kd">func</span><span class="w"> </span><span class="nx">lambdaHandler</span><span class="p">(</span><span class="nx">event</span><span class="w"> </span><span class="nx">events</span><span class="p">.</span><span class="nx">APIGatewayProxyRequest</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nx">events</span><span class="p">.</span><span class="nx">APIGatewayProxyResponse</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">r</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">httptest</span><span class="p">.</span><span class="nx">NewRequest</span><span class="p">(</span><span class="nx">event</span><span class="p">.</span><span class="nx">HTTPMethod</span><span class="p">,</span><span class="w"> </span><span class="nx">event</span><span class="p">.</span><span class="nx">Path</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span><span class="w">
    </span><span class="nx">w</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">httptest</span><span class="p">.</span><span class="nx">NewRecorder</span><span class="p">()</span><span class="w">
    </span><span class="nx">http</span><span class="p">.</span><span class="nx">DefaultServeMux</span><span class="p">.</span><span class="nx">ServeHTTP</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">)</span><span class="w">
    </span><span class="nx">respEvent</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">events</span><span class="p">.</span><span class="nx">APIGatewayProxyResponse</span><span class="p">{</span><span class="w">
        </span><span class="nx">Body</span><span class="p">:</span><span class="w"> </span><span class="nx">w</span><span class="p">.</span><span class="nx">Body</span><span class="p">.</span><span class="nx">String</span><span class="p">(),</span><span class="w">
        </span><span class="nx">StatusCode</span><span class="p">:</span><span class="w"> </span><span class="nx">w</span><span class="p">.</span><span class="nx">Code</span><span class="p">,</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">respEvent</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>The code above will work for simple handlers, but it needs a few more things to be ready for production - support headers, query string parameters and binary responses. I created a package <a class="reference external" href="https://github.com/akrylysov/algnhsa" target="_blank">algnhsa</a>, it can be used as a drop-in replacement for the <span class="docutils literal">net/http</span> server:</p>
<pre class="code go literal-block"><code><span class="kn">package</span><span class="w"> </span><span class="nx">main</span><span class="w">

</span><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
    </span><span class="s">&quot;fmt&quot;</span><span class="w">
    </span><span class="s">&quot;net/http&quot;</span><span class="w">
    </span><span class="s">&quot;strconv&quot;</span><span class="w">

    </span><span class="s">&quot;github.com/akrylysov/algnhsa&quot;</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">indexHandler</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">w</span><span class="p">.</span><span class="nx">Write</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="s">&quot;index&quot;</span><span class="p">))</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">addHandler</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">Atoi</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">FormValue</span><span class="p">(</span><span class="s">&quot;first&quot;</span><span class="p">))</span><span class="w">
    </span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">Atoi</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">FormValue</span><span class="p">(</span><span class="s">&quot;second&quot;</span><span class="p">))</span><span class="w">
    </span><span class="nx">w</span><span class="p">.</span><span class="nx">Header</span><span class="p">().</span><span class="nx">Set</span><span class="p">(</span><span class="s">&quot;X-Hi&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;foo&quot;</span><span class="p">)</span><span class="w">
    </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Fprintf</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;%d&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">f</span><span class="o">+</span><span class="nx">s</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">http</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">&quot;/&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">indexHandler</span><span class="p">)</span><span class="w">
    </span><span class="nx">http</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">&quot;/add&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">addHandler</span><span class="p">)</span><span class="w">
    </span><span class="nx">algnhsa</span><span class="p">.</span><span class="nx">ListenAndServe</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">DefaultServeMux</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>On the API Gateway side define a proxy <span class="docutils literal">ANY</span> method to handle requests to <span class="docutils literal">/</span> and a catch-all <span class="docutils literal">{proxy+}</span> resource to handle requests to every other path:</p>
<img alt="" src="https://artem.krylysov.com/images/2018-golambda/apigateway-catchall.png" />
<p>To make the API Gateway treat certain content types as binary, you need to add the desired types to your API's &quot;Binary Media Types&quot; (Settings section) and also pass them to the <span class="docutils literal">algnhsa.ListenAndServe</span> function:</p>
<pre class="code go literal-block"><code><span class="nx">algnhsa</span><span class="p">.</span><span class="nx">ListenAndServe</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">DefaultServeMux</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;image/jpeg&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;image/png&quot;</span><span class="p">}])</span></code></pre>
<p>You can find the algnhsa package (Lambda Go net/http server adapter) on GitHub - <a class="reference external" href="https://github.com/akrylysov/algnhsa" target="_blank">https://github.com/akrylysov/algnhsa</a>.</p>
]]>
        </content>
    </entry>
    
    <entry>
        <title><![CDATA[Handling C++ exceptions in Go]]></title>
        <link href="https://artem.krylysov.com/blog/2017/04/13/handling-cpp-exceptions-in-go/"/>
        <updated>2017-04-13T00:00:00Z</updated>
        <id>https://artem.krylysov.com/blog/2017/04/13/handling-cpp-exceptions-in-go/</id>
        <content type="html">
            <![CDATA[<p>Cgo is a mechanism that allows Go packages call C code. The Go compiler enables cgo for every <span class="docutils literal">.go</span> source file that imports a special pseudo package <span class="docutils literal">&quot;C&quot;</span>. The text in the comment before the <span class="docutils literal">import &quot;C&quot;</span> line is treated as a C code. You can include headers, define functions, types and variables - everything a normal C code can do:</p>
<pre class="code go literal-block"><code><span class="kn">package</span><span class="w"> </span><span class="nx">main</span><span class="w">

</span><span class="cm">/*
#include &lt;stdio.h&gt;

void foo(int x) {
    printf(&quot;x: %d\n&quot;, x);
}
 */</span><span class="w">
</span><span class="kn">import</span><span class="w"> </span><span class="s">&quot;C&quot;</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">C</span><span class="p">.</span><span class="nx">foo</span><span class="p">(</span><span class="nx">C</span><span class="p">.</span><span class="nb">int</span><span class="p">(</span><span class="mi">123</span><span class="p">))</span><span class="w"> </span><span class="c1">// x: 123</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>The identifiers declared in the embedded C code can be accessed using the <span class="docutils literal">&quot;C&quot;</span> package - e.g. write <span class="docutils literal">C.foo(C.int(123))</span> to call the function <span class="docutils literal">foo</span>. Pretty straightforward, right? This opens the door to a huge amount of code that was written in C or was written in any other language and provides C bindings.</p>
<p>Almost all libraries play by the rules, but some libraries that are written in C++ may throw exceptions. C doesn't support exceptions, therefore there is no way to catch them in C or Go/cgo. C function <em>should</em> never throw an exception, but nothing stops a developer from writing code that does it. A good example of such a library is a machine learning project <a class="reference external" href="https://github.com/JohnLangford/vowpal_wabbit" target="_blank">Vowpal Wabbit</a> - it's written in C++ and provides a C interface, which may throw an exception from time to time:</p>
<pre class="code go literal-block"><code><span class="kn">package</span><span class="w"> </span><span class="nx">main</span><span class="w">

</span><span class="c1">// #cgo LDFLAGS: -lvw_c_wrapper</span><span class="w">
</span><span class="c1">// #include &lt;stdlib.h&gt;</span><span class="w">
</span><span class="c1">// #include &lt;vowpalwabbit/vwdll.h&gt;</span><span class="w">
</span><span class="kn">import</span><span class="w"> </span><span class="s">&quot;C&quot;</span><span class="w">
</span><span class="kn">import</span><span class="w"> </span><span class="s">&quot;unsafe&quot;</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">cArgs</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">C</span><span class="p">.</span><span class="nx">CString</span><span class="p">(</span><span class="s">&quot;invalid args&quot;</span><span class="p">)</span><span class="w">
    </span><span class="k">defer</span><span class="w"> </span><span class="nx">C</span><span class="p">.</span><span class="nx">free</span><span class="p">(</span><span class="nx">unsafe</span><span class="p">.</span><span class="nx">Pointer</span><span class="p">(</span><span class="nx">cArgs</span><span class="p">))</span><span class="w">
    </span><span class="nx">C</span><span class="p">.</span><span class="nx">VW_InitializeA</span><span class="p">(</span><span class="nx">cArgs</span><span class="p">)</span><span class="w"> </span><span class="c1">// exception</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>Unhandled C++ exception on <span class="docutils literal">VW_InitializeA</span> call just crashes the program:</p>
<pre class="literal-block">SIGABRT: abort
PC=0x7efc793a7428 m=0 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 1 [syscall, locked to thread]:
runtime.cgocall(0x4500b0, 0xc420049f58, 0xc420049f58)
    /usr/lib/go-1.8/src/runtime/cgocall.go:131 +0xe2 fp=0xc420049f28 sp=0xc420049ee8
main._Cfunc_VW_InitializeA(0x193c620, 0x0)</pre>
<p>While cgo lets us call only C code, we still can link any C++ file to our program. This gives us an ability to create a C wrapper for <span class="docutils literal">VW_InitializeA</span> that handles the exceptions. We can use a structure to return both the original return value and a pointer to the exception message (good old C doesn't support tuples or multiple return values):</p>
<pre class="code cpp literal-block"><code><span class="c1">// vw_wrapper.h
</span><span class="cp">#ifdef __cplusplus
</span><span class="k">extern</span><span class="w"> </span><span class="s">&quot;C&quot;</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="cp">#endif
</span><span class="w">
</span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;stdlib.h&gt;</span><span class="cp">
#include</span><span class="w"> </span><span class="cpf">&lt;vowpalwabbit/vwdll.h&gt;</span><span class="cp">
</span><span class="w">
</span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">VWW_HANDLE_ERR</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="n">VW_HANDLE</span><span class="w"> </span><span class="n">handle</span><span class="p">;</span><span class="w">
    </span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">pstrErr</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w"> </span><span class="n">VWW_HANDLE_ERR</span><span class="p">;</span><span class="w">

</span><span class="n">VW_DLL_MEMBER</span><span class="w"> </span><span class="n">VWW_HANDLE_ERR</span><span class="w"> </span><span class="n">VW_CALLING_CONV</span><span class="w"> </span><span class="n">VWW_InitializeA</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">pstrArgs</span><span class="p">);</span><span class="w">

</span><span class="cp">#ifdef __cplusplus
</span><span class="p">}</span><span class="w">
</span><span class="cp">#endif</span></code></pre>
<p>The wrapper code is only a few lines, save it in the project directory as a <span class="docutils literal">.cpp</span> file:</p>
<pre class="code cpp literal-block"><code><span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;string.h&gt;</span><span class="cp">
#include</span><span class="w"> </span><span class="cpf">&lt;exception&gt;</span><span class="cp">
#include</span><span class="w"> </span><span class="cpf">&quot;vw_wrapper.h&quot;</span><span class="cp">
</span><span class="w">
</span><span class="n">VW_DLL_MEMBER</span><span class="w"> </span><span class="n">VWW_HANDLE_ERR</span><span class="w"> </span><span class="n">VW_CALLING_CONV</span><span class="w"> </span><span class="n">VWW_InitializeA</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">pstrArgs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="n">VWW_HANDLE_ERR</span><span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="mi">0</span><span class="p">};</span><span class="w">
    </span><span class="k">try</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="n">result</span><span class="p">.</span><span class="n">handle</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">VW_InitializeA</span><span class="p">(</span><span class="n">pstrArgs</span><span class="p">);</span><span class="w">
    </span><span class="p">}</span><span class="w"> </span><span class="k">catch</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">exception</span><span class="w"> </span><span class="o">&amp;</span><span class="n">e</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="n">result</span><span class="p">.</span><span class="n">pstrErr</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">strdup</span><span class="p">(</span><span class="n">e</span><span class="p">.</span><span class="n">what</span><span class="p">());</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="n">result</span><span class="p">;</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>The Go compiler will compile and link the C++ source with the program. Don't forget to explicitly free the <span class="docutils literal">strErr</span> pointer if its value is not <span class="docutils literal">nil</span> - Go garbage collector doesn't know how to deal with C pointers:</p>
<pre class="code go literal-block"><code><span class="kn">package</span><span class="w"> </span><span class="nx">main</span><span class="w">

</span><span class="c1">// #cgo LDFLAGS: -lvw_c_wrapper</span><span class="w">
</span><span class="c1">// #include &lt;stdlib.h&gt;</span><span class="w">
</span><span class="c1">// #include &quot;vw_wrapper.h&quot;</span><span class="w">
</span><span class="kn">import</span><span class="w"> </span><span class="s">&quot;C&quot;</span><span class="w">

</span><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
    </span><span class="s">&quot;log&quot;</span><span class="w">
    </span><span class="s">&quot;unsafe&quot;</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">cArgs</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">C</span><span class="p">.</span><span class="nx">CString</span><span class="p">(</span><span class="s">&quot;invalid args&quot;</span><span class="p">)</span><span class="w">
    </span><span class="k">defer</span><span class="w"> </span><span class="nx">C</span><span class="p">.</span><span class="nx">free</span><span class="p">(</span><span class="nx">unsafe</span><span class="p">.</span><span class="nx">Pointer</span><span class="p">(</span><span class="nx">cArgs</span><span class="p">))</span><span class="w">
    </span><span class="nx">ret</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">C</span><span class="p">.</span><span class="nx">VWW_InitializeA</span><span class="p">(</span><span class="nx">cArgs</span><span class="p">)</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nx">ret</span><span class="p">.</span><span class="nx">pstrErr</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">defer</span><span class="w"> </span><span class="nx">C</span><span class="p">.</span><span class="nx">free</span><span class="p">(</span><span class="nx">unsafe</span><span class="p">.</span><span class="nx">Pointer</span><span class="p">(</span><span class="nx">ret</span><span class="p">.</span><span class="nx">pstrErr</span><span class="p">))</span><span class="w">
        </span><span class="nx">log</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">&quot;VW error: &quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">C</span><span class="p">.</span><span class="nx">GoString</span><span class="p">(</span><span class="nx">ret</span><span class="p">.</span><span class="nx">pstrErr</span><span class="p">))</span><span class="w">
    </span><span class="p">}</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>Now, whenever the original <span class="docutils literal">VW_InitializeA</span> throws an exception, the C++ wrapper will catch it and return the error message as a part of the <span class="docutils literal">VWW_HANDLE_ERR</span> structure.</p>
]]>
        </content>
    </entry>
    
    <entry>
        <title><![CDATA[Profiling and optimizing Go web applications]]></title>
        <link href="https://artem.krylysov.com/blog/2017/03/13/profiling-and-optimizing-go-web-applications/"/>
        <updated>2017-03-13T00:00:00Z</updated>
        <id>https://artem.krylysov.com/blog/2017/03/13/profiling-and-optimizing-go-web-applications/</id>
        <content type="html">
            <![CDATA[<aside class="admonition note">
<p class="admonition-title">Note</p>
<p>This post was updated on 2021-04-25.</p>
</aside>
<p>Go has a powerful built-in profiler that supports CPU, memory, goroutine and block (contention) profiling.</p>
<section id="enabling-the-profiler">
<h3>Enabling the profiler<a class="headerlink" href="#enabling-the-profiler" title="Permalink to this headline"> #</a></h3>
<p>Go provides a low-level profiling API <a class="reference external" href="https://golang.org/pkg/runtime/pprof/" target="_blank">runtime/pprof</a>, but if you are developing a long-running service, it's more convenient to work with a high-level <a class="reference external" href="https://golang.org/pkg/net/http/pprof/" target="_blank">net/http/pprof</a> package.</p>
<p>All you need to enable the profiler is to import <span class="docutils literal">net/http/pprof</span> and it will automatically register the required HTTP handlers:</p>
<pre class="code go literal-block"><code><span class="kn">package</span><span class="w"> </span><span class="nx">main</span><span class="w">

</span><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
    </span><span class="s">&quot;net/http&quot;</span><span class="w">
    </span><span class="nx">_</span><span class="w"> </span><span class="s">&quot;net/http/pprof&quot;</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">hiHandler</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">w</span><span class="p">.</span><span class="nx">Write</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="s">&quot;hi&quot;</span><span class="p">))</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">http</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">&quot;/&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">hiHandler</span><span class="p">)</span><span class="w">
    </span><span class="nx">http</span><span class="p">.</span><span class="nx">ListenAndServe</span><span class="p">(</span><span class="s">&quot;:8080&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>If your web application is using a custom URL router, you'll need to register a few pprof HTTP endpoints manually:</p>
<pre class="code go literal-block"><code><span class="kn">package</span><span class="w"> </span><span class="nx">main</span><span class="w">

</span><span class="kn">import</span><span class="w"> </span><span class="p">(</span><span class="w">
    </span><span class="s">&quot;net/http&quot;</span><span class="w">
    </span><span class="s">&quot;net/http/pprof&quot;</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">hiHandler</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">w</span><span class="p">.</span><span class="nx">Write</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="s">&quot;hi&quot;</span><span class="p">))</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">r</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">NewServeMux</span><span class="p">()</span><span class="w">
    </span><span class="nx">r</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">&quot;/&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">hiHandler</span><span class="p">)</span><span class="w">

    </span><span class="c1">// Register pprof handlers</span><span class="w">
    </span><span class="nx">r</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">&quot;/debug/pprof/&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">pprof</span><span class="p">.</span><span class="nx">Index</span><span class="p">)</span><span class="w">
    </span><span class="nx">r</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">&quot;/debug/pprof/cmdline&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">pprof</span><span class="p">.</span><span class="nx">Cmdline</span><span class="p">)</span><span class="w">
    </span><span class="nx">r</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">&quot;/debug/pprof/profile&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">pprof</span><span class="p">.</span><span class="nx">Profile</span><span class="p">)</span><span class="w">
    </span><span class="nx">r</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">&quot;/debug/pprof/symbol&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">pprof</span><span class="p">.</span><span class="nx">Symbol</span><span class="p">)</span><span class="w">
    </span><span class="nx">r</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">&quot;/debug/pprof/trace&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">pprof</span><span class="p">.</span><span class="nx">Trace</span><span class="p">)</span><span class="w">

    </span><span class="nx">http</span><span class="p">.</span><span class="nx">ListenAndServe</span><span class="p">(</span><span class="s">&quot;:8080&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">)</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>That's it, launch the application, and then use the pprof tool:</p>
<pre class="code bash literal-block"><code>go<span class="w"> </span>tool<span class="w"> </span>pprof<span class="w"> </span><span class="o">[</span>binary<span class="o">]</span><span class="w"> </span>http://127.0.0.1:8080/debug/pprof/profile</code></pre>
<p>One of the biggest pprof advantages is that it has low overhead and can be used in a production environment on a live traffic without any noticeable performance penalties.</p>
<p>But before digging deeper into pprof, we need a real example which can show how to identify and fix performance issues in Go.</p>
</section>
<section id="example-left-pad-microservice">
<h3>Example: Left-pad microservice<a class="headerlink" href="#example-left-pad-microservice" title="Permalink to this headline"> #</a></h3>
<p>Assume you need to develop a brand-new microservice that adds a left padding to a given input string:</p>
<pre class="code bash literal-block"><code>$<span class="w"> </span>curl<span class="w"> </span><span class="s2">&quot;http://127.0.0.1:8080/v1/leftpad/?str=test&amp;len=10&amp;chr=*&quot;</span><span class="w">
</span><span class="o">{</span><span class="s2">&quot;str&quot;</span>:<span class="s2">&quot;******test&quot;</span><span class="o">}</span></code></pre>
<p>The service needs to collect some basic metrics - the number of incoming requests and how long every request takes. All collected metrics are supposed to be sent to a metric aggregator (e.g. <a class="reference external" href="https://github.com/etsy/statsd" target="_blank">StatsD</a>). In addition, the service needs to log the request details - URL, IP address and user-agent.</p>
<p>You can find the initial implementation on GitHub tagged as <a class="reference external" href="https://github.com/akrylysov/goprofex/tree/v1" target="_blank">v1</a>.</p>
<p>Compile and run the application:</p>
<pre class="code bash literal-block"><code>go<span class="w"> </span>build<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span>./goprofex</code></pre>
</section>
<section id="measuring-the-performance">
<h3>Measuring the performance<a class="headerlink" href="#measuring-the-performance" title="Permalink to this headline"> #</a></h3>
<p>We are going to measure how many requests per second the microservice is able to handle. This can be done using the <a class="reference external" href="https://httpd.apache.org/docs/2.4/programs/ab.html" target="_blank">Apache Benchmark tool</a>:</p>
<pre class="code bash literal-block"><code>ab<span class="w"> </span>-k<span class="w"> </span>-c<span class="w"> </span><span class="m">8</span><span class="w"> </span>-n<span class="w"> </span><span class="m">100000</span><span class="w"> </span><span class="s2">&quot;http://127.0.0.1:8080/v1/leftpad/?str=test&amp;len=50&amp;chr=*&quot;</span><span class="w">
</span><span class="c1"># -k   Enables HTTP keep-alive
# -c   Number of concurrent requests
# -n   Number of total requests to make</span></code></pre>
<p>Not bad, but could be faster:</p>
<pre class="code literal-block"><code>Requests per second:    22810.15 [#/sec] (mean)
Time per request:       0.042 [ms] (mean, across all concurrent requests)</code></pre>
<aside class="admonition note">
<p class="admonition-title">Note</p>
<p>The benchmarking was performed on MacBook Pro Late 2013 (2.6 GHz Intel Core i5, 8 GB 1600 MHz DDR3, macOS 10.12.3) using Go 1.8.</p>
</aside>
</section>
<section id="cpu-profile">
<h3>CPU profile<a class="headerlink" href="#cpu-profile" title="Permalink to this headline"> #</a></h3>
<p>Run the Apache benchmark tool again, but with a high number of requests (1 million should be enough) and at the same time run pprof:</p>
<pre class="code bash literal-block"><code>go<span class="w"> </span>tool<span class="w"> </span>pprof<span class="w"> </span>http://127.0.0.1:8080/debug/pprof/profile</code></pre>
<p>The CPU profiler runs for 30 seconds by default. It uses sampling to determine which functions spend most of the CPU time. The Go runtime stops the execution every 10 milliseconds and records the current call stack of all running goroutines.</p>
<p>When pprof enters the interactive mode, type <span class="docutils literal">top</span>, the command will show a list of functions that appeared most in the collected samples. In our case these are all runtime and standard library functions, which is not very useful:</p>
<pre class="code literal-block"><code>(pprof) top
63.77s of 69.02s total (92.39%)
Dropped 331 nodes (cum &lt;= 0.35s)
Showing top 10 nodes out of 78 (cum &gt;= 0.64s)
      flat  flat%   sum%        cum   cum%
    50.79s 73.59% 73.59%     50.92s 73.78%  syscall.Syscall
     4.66s  6.75% 80.34%      4.66s  6.75%  runtime.kevent
     2.65s  3.84% 84.18%      2.65s  3.84%  runtime.usleep
     1.88s  2.72% 86.90%      1.88s  2.72%  runtime.freedefer
     1.31s  1.90% 88.80%      1.31s  1.90%  runtime.mach_semaphore_signal
     1.10s  1.59% 90.39%      1.10s  1.59%  runtime.mach_semaphore_wait
     0.51s  0.74% 91.13%      0.61s  0.88%  log.(*Logger).formatHeader
     0.49s  0.71% 91.84%      1.06s  1.54%  runtime.mallocgc
     0.21s   0.3% 92.15%      0.56s  0.81%  runtime.concatstrings
     0.17s  0.25% 92.39%      0.64s  0.93%  fmt.(*pp).doPrintf</code></pre>
<p>There is a much better way to look at the high-level performance overview - <span class="docutils literal"><span class="pre">-http</span></span> flag:</p>
<pre class="code bash literal-block"><code>go<span class="w"> </span>tool<span class="w"> </span>pprof<span class="w"> </span>-http<span class="o">=</span>:8081<span class="w"> </span>http://127.0.0.1:8080/debug/pprof/profile</code></pre>
<p>When you run <span class="docutils literal">pprof</span> with the <span class="docutils literal"><span class="pre">-http</span></span> flag, the tool opens the profile in the web browser. The graph allows to see all hot spots:</p>
<img alt="" src="https://artem.krylysov.com/images/2017-goprofex/web-cpu.png" style="width: 510px;" />
<p>From the graph above you can see that the application spends a big chunk of CPU on logging, metric reporting and some time on garbage collection.</p>
<p>Use <span class="docutils literal">list</span> to inspect every function in details, e.g. <span class="docutils literal">list leftpad</span>:</p>
<pre class="code literal-block"><code>(pprof) list leftpad
ROUTINE ======================== main.leftpad in /Users/artem/go/src/github.com/akrylysov/goprofex/leftpad.go
      20ms      490ms (flat, cum)  0.71% of Total
         .          .      3:func leftpad(s string, length int, char rune) string {
         .          .      4:   for len(s) &lt; length {
      20ms      490ms      5:       s = string(char) + s
         .          .      6:   }
         .          .      7:   return s
         .          .      8:}</code></pre>
<p>For those who are not afraid to look at the disassembled code, pprof includes <span class="docutils literal">disasm</span> command, which helps to see the actual processor instructions:</p>
<pre class="code literal-block"><code>(pprof) disasm leftpad
ROUTINE ======================== main.leftpad
      20ms      490ms (flat, cum)  0.71% of Total
         .          .    1312ab0: GS MOVQ GS:0x8a0, CX
         .          .    1312ab9: CMPQ 0x10(CX), SP
         .          .    1312abd: JBE 0x1312b5e
         .          .    1312ac3: SUBQ $0x48, SP
         .          .    1312ac7: MOVQ BP, 0x40(SP)
         .          .    1312acc: LEAQ 0x40(SP), BP
         .          .    1312ad1: MOVQ 0x50(SP), AX
         .          .    1312ad6: MOVQ 0x58(SP), CX
...</code></pre>
</section>
<section id="heap-profile">
<h3>Heap profile<a class="headerlink" href="#heap-profile" title="Permalink to this headline"> #</a></h3>
<p>Run the heap profiler:</p>
<pre class="code bash literal-block"><code>go<span class="w"> </span>tool<span class="w"> </span>pprof<span class="w"> </span>http://127.0.0.1:8080/debug/pprof/heap</code></pre>
<p>By default it shows the amount of memory currently in-use:</p>
<pre class="code literal-block"><code>(pprof) top
512.17kB of 512.17kB total (  100%)
Dropped 85 nodes (cum &lt;= 2.56kB)
Showing top 10 nodes out of 13 (cum &gt;= 512.17kB)
      flat  flat%   sum%        cum   cum%
  512.17kB   100%   100%   512.17kB   100%  runtime.mapassign
         0     0%   100%   512.17kB   100%  main.leftpadHandler
         0     0%   100%   512.17kB   100%  main.timedHandler.func1
         0     0%   100%   512.17kB   100%  net/http.(*Request).FormValue
         0     0%   100%   512.17kB   100%  net/http.(*Request).ParseForm
         0     0%   100%   512.17kB   100%  net/http.(*Request).ParseMultipartForm
         0     0%   100%   512.17kB   100%  net/http.(*ServeMux).ServeHTTP
         0     0%   100%   512.17kB   100%  net/http.(*conn).serve
         0     0%   100%   512.17kB   100%  net/http.HandlerFunc.ServeHTTP
         0     0%   100%   512.17kB   100%  net/http.serverHandler.ServeHTTP</code></pre>
<p>But we are more interested in the number of allocated objects. Call pprof with <span class="docutils literal"><span class="pre">-alloc_objects</span></span> option:</p>
<pre class="code bash literal-block"><code>go<span class="w"> </span>tool<span class="w"> </span>pprof<span class="w"> </span>-alloc_objects<span class="w"> </span>http://127.0.0.1:8080/debug/pprof/heap</code></pre>
<p>Almost 70% of all objects was allocated only by two functions - <span class="docutils literal">leftpad</span> and <span class="docutils literal">StatsD.Send</span>, we'll need to look at them closer:</p>
<pre class="code literal-block"><code>(pprof) top
559346486 of 633887751 total (88.24%)
Dropped 32 nodes (cum &lt;= 3169438)
Showing top 10 nodes out of 46 (cum &gt;= 14866706)
      flat  flat%   sum%        cum   cum%
 218124937 34.41% 34.41%  218124937 34.41%  main.leftpad
 116692715 18.41% 52.82%  218702222 34.50%  main.(*StatsD).Send
  52326692  8.25% 61.07%   57278218  9.04%  fmt.Sprintf
  39437390  6.22% 67.30%   39437390  6.22%  strconv.FormatFloat
  30689052  4.84% 72.14%   30689052  4.84%  strings.NewReplacer
  29869965  4.71% 76.85%   29968270  4.73%  net/textproto.(*Reader).ReadMIMEHeader
  20441700  3.22% 80.07%   20441700  3.22%  net/url.parseQuery
  19071266  3.01% 83.08%  374683692 59.11%  main.leftpadHandler
  17826063  2.81% 85.90%  558753994 88.15%  main.timedHandler.func1
  14866706  2.35% 88.24%   14866706  2.35%  net/http.Header.clone</code></pre>
<p>Other useful options to debug memory issues are <span class="docutils literal"><span class="pre">-inuse_objects</span></span> - displays the count of objects in-use and <span class="docutils literal"><span class="pre">-alloc_space</span></span> - shows how much memory has been allocated since the program start.</p>
<p>Automatic memory management is convenient, but nothing is free in the world. Dynamic allocations are not only significantly slower than stack allocations but also affect the performance indirectly. Every piece of memory you allocate on the heap adds more work to the GC and makes it use more CPU resources. The only way to make the application spend less time on garbage collection is to reduce allocations.</p>
<section id="escape-analysis">
<h4>Escape analysis<a class="headerlink" href="#escape-analysis" title="Permalink to this headline"> #</a></h4>
<p>Whenever you use the <span class="docutils literal">&amp;</span> operator to get a pointer to a variable or allocate a new value using <span class="docutils literal">make</span> or <span class="docutils literal">new</span> it doesn't necessary mean that it's allocated on the heap.</p>
<pre class="code go literal-block"><code><span class="kd">func</span><span class="w"> </span><span class="nx">foo</span><span class="p">(</span><span class="nx">a</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">a</span><span class="p">))</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nx">foo</span><span class="p">(</span><span class="nb">make</span><span class="p">([]</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">))</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>In the example above <span class="docutils literal"><span class="pre">make([]string,</span> 8)</span> is allocated on stack. Go uses escape analysis to determine if it's safe to allocate memory on stack instead of the heap. You can add <span class="docutils literal"><span class="pre">-gcflags=-m</span></span> option to see the results of escape analysis:</p>
<pre class="code go literal-block"><code><span class="mi">5</span><span class="w">  </span><span class="kd">type</span><span class="w"> </span><span class="nx">X</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="nx">v</span><span class="w"> </span><span class="kt">int</span><span class="p">}</span><span class="w">
</span><span class="mi">6</span><span class="w">
</span><span class="mi">7</span><span class="w">  </span><span class="kd">func</span><span class="w"> </span><span class="nx">foo</span><span class="p">(</span><span class="nx">x</span><span class="w"> </span><span class="o">*</span><span class="nx">X</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="mi">8</span><span class="w">       </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">x</span><span class="p">.</span><span class="nx">v</span><span class="p">)</span><span class="w">
</span><span class="mi">9</span><span class="w">  </span><span class="p">}</span><span class="w">
</span><span class="mi">10</span><span class="w">
</span><span class="mi">11</span><span class="w"> </span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="mi">12</span><span class="w">      </span><span class="nx">x</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">X</span><span class="p">{</span><span class="mi">1</span><span class="p">}</span><span class="w">
</span><span class="mi">13</span><span class="w">      </span><span class="nx">foo</span><span class="p">(</span><span class="nx">x</span><span class="p">)</span><span class="w">
</span><span class="mi">14</span><span class="w"> </span><span class="p">}</span></code></pre>
<pre class="code bash literal-block"><code>go<span class="w"> </span>build<span class="w"> </span>-gcflags<span class="o">=</span>-m<span class="w">
</span>./main.go:7:<span class="w"> </span>foo<span class="w"> </span>x<span class="w"> </span>does<span class="w"> </span>not<span class="w"> </span>escape<span class="w">
</span>./main.go:12:<span class="w"> </span>main<span class="w"> </span><span class="p">&amp;</span>X<span class="w"> </span>literal<span class="w"> </span>does<span class="w"> </span>not<span class="w"> </span>escape</code></pre>
<p>Go compiler is smart enough to turn some dynamic allocations into stack allocations. Things get worse for example when you start dealing with interfaces:</p>
<pre class="code go literal-block"><code><span class="c1">// Example 1</span><span class="w">
</span><span class="kd">type</span><span class="w"> </span><span class="nx">Fooer</span><span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nx">foo</span><span class="p">(</span><span class="nx">a</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">type</span><span class="w"> </span><span class="nx">FooerX</span><span class="w"> </span><span class="kd">struct</span><span class="p">{}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">FooerX</span><span class="p">)</span><span class="w"> </span><span class="nx">foo</span><span class="p">(</span><span class="nx">a</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">a</span><span class="p">))</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nx">a</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">)</span><span class="w"> </span><span class="c1">// make([]string, 8) escapes to heap</span><span class="w">
      </span><span class="kd">var</span><span class="w"> </span><span class="nx">fooer</span><span class="w"> </span><span class="nx">Fooer</span><span class="w">
      </span><span class="nx">fooer</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">FooerX</span><span class="p">{}</span><span class="w">
      </span><span class="nx">fooer</span><span class="p">.</span><span class="nx">foo</span><span class="p">(</span><span class="nx">a</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="c1">// Example 2</span><span class="w">
</span><span class="kd">func</span><span class="w"> </span><span class="nx">foo</span><span class="p">(</span><span class="nx">a</span><span class="w"> </span><span class="kd">interface</span><span class="p">{})</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="k">return</span><span class="w"> </span><span class="nx">a</span><span class="p">.(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Stringer</span><span class="p">).</span><span class="nx">String</span><span class="p">()</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nx">foo</span><span class="p">(</span><span class="nb">make</span><span class="p">([]</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">))</span><span class="w"> </span><span class="c1">// make([]string, 8) escapes to heap</span><span class="w">
</span><span class="p">}</span></code></pre>
<p><a class="reference external" href="https://docs.google.com/document/d/1CxgUBPlx9iJzkz9JWkb6tIpTe5q32QDmz8l0BouG0Cw/view" target="_blank">Go Escape Analysis Flaws</a> paper by Dmitry Vyukov describes more cases that escape analysis is unable to handle.</p>
<p>Generally speaking, you should prefer values over pointers for small structures that you don't need to change.</p>
<aside class="admonition note">
<p class="admonition-title">Note</p>
<p>For big structures, it might be cheaper to pass a pointer than to copy the whole structure and pass it by value.</p>
</aside>
</section>
</section>
<section id="goroutine-profile">
<h3>Goroutine profile<a class="headerlink" href="#goroutine-profile" title="Permalink to this headline"> #</a></h3>
<p>Goroutine profile dumps the goroutine call stack and the number of running goroutines:</p>
<pre class="code bash literal-block"><code>go<span class="w"> </span>tool<span class="w"> </span>pprof<span class="w"> </span>http://127.0.0.1:8080/debug/pprof/goroutine</code></pre>
<img alt="" src="https://artem.krylysov.com/images/2017-goprofex/web-goroutine.png" style="width: 520px;" />
<p>There are only 18 active goroutines, which is very low. It's not uncommon to have thousands of running goroutines without significant performance degradation.</p>
</section>
<section id="block-profile">
<h3>Block profile<a class="headerlink" href="#block-profile" title="Permalink to this headline"> #</a></h3>
<p>Blocking profile shows function calls that led to blocking on synchronization primitives like mutexes and channels.</p>
<p>Before running the block contention profile, you have to set a profiling rate using <a class="reference external" href="https://golang.org/pkg/runtime/#SetBlockProfileRate" target="_blank">runtime.SetBlockProfileRate</a>. You can add the call to your <span class="docutils literal">main</span> or <span class="docutils literal">init</span> function.</p>
<pre class="code bash literal-block"><code>go<span class="w"> </span>tool<span class="w"> </span>pprof<span class="w"> </span>http://127.0.0.1:8080/debug/pprof/block</code></pre>
<img alt="" src="https://artem.krylysov.com/images/2017-goprofex/web-block.png" style="width: 224px;" />
<p><span class="docutils literal">timedHandler</span> and <span class="docutils literal">leftpadHandler</span> spend a lot of time waiting on a mutex inside <span class="docutils literal">log.Printf</span>. It happens because <span class="docutils literal">log</span> package implementation uses a mutex to synchronize access to a file shared across multiple goroutines.</p>
</section>
<section id="benchmarking">
<h3>Benchmarking<a class="headerlink" href="#benchmarking" title="Permalink to this headline"> #</a></h3>
<p>As we noticed before, the biggest offenders in terms of performance are <span class="docutils literal">log</span> package, <span class="docutils literal">leftpad</span> and <span class="docutils literal">StatsD.Send</span> functions. Now we found the bottleneck, but before starting to optimize the code, we need a reproducible way to measure the performance of the code we are interested in. The Go <a class="reference external" href="https://golang.org/pkg/testing/" target="_blank">testing</a> package includes such a mechanism. You need to create a function in the form of <span class="docutils literal">func <span class="pre">BenchmarkXxx(*testing.B)</span></span> in a test file:</p>
<pre class="code go literal-block"><code><span class="kd">func</span><span class="w"> </span><span class="nx">BenchmarkStatsD</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">B</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">statsd</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">StatsD</span><span class="p">{</span><span class="w">
        </span><span class="nx">Namespace</span><span class="p">:</span><span class="w">  </span><span class="s">&quot;namespace&quot;</span><span class="p">,</span><span class="w">
        </span><span class="nx">SampleRate</span><span class="p">:</span><span class="w"> </span><span class="mf">0.5</span><span class="p">,</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">N</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">statsd</span><span class="p">.</span><span class="nx">Incr</span><span class="p">(</span><span class="s">&quot;test&quot;</span><span class="p">)</span><span class="w">
    </span><span class="p">}</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>It's also possible to benchmark the whole HTTP handler using <a class="reference external" href="https://golang.org/pkg/net/http/httptest/" target="_blank">net/http/httptest</a> package:</p>
<pre class="code go literal-block"><code><span class="kd">func</span><span class="w"> </span><span class="nx">BenchmarkLeftpadHandler</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">B</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">r</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">httptest</span><span class="p">.</span><span class="nx">NewRequest</span><span class="p">(</span><span class="s">&quot;GET&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;/v1/leftpad/?str=test&amp;len=50&amp;chr=*&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">N</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">w</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">httptest</span><span class="p">.</span><span class="nx">NewRecorder</span><span class="p">()</span><span class="w">
        </span><span class="nx">leftpadHandler</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">)</span><span class="w">
    </span><span class="p">}</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>Run the benchmarks:</p>
<pre class="code bash literal-block"><code>go<span class="w"> </span><span class="nb">test</span><span class="w"> </span>-bench<span class="o">=</span>.<span class="w"> </span>-benchmem</code></pre>
<p>It shows the amount of time each iteration takes and the amount of memory/number of allocations:</p>
<pre class="code literal-block"><code>BenchmarkTimedHandler-4           200000          6511 ns/op        1621 B/op         41 allocs/op
BenchmarkLeftpadHandler-4         200000         10546 ns/op        3297 B/op         75 allocs/op
BenchmarkLeftpad10-4             5000000           339 ns/op          64 B/op          6 allocs/op
BenchmarkLeftpad50-4              500000          3079 ns/op        1568 B/op         46 allocs/op
BenchmarkStatsD-4                1000000          1516 ns/op         560 B/op         15 allocs/op</code></pre>
</section>
<section id="improving-the-performance">
<h3>Improving the performance<a class="headerlink" href="#improving-the-performance" title="Permalink to this headline"> #</a></h3>
<section id="logging">
<h4>Logging<a class="headerlink" href="#logging" title="Permalink to this headline"> #</a></h4>
<p>A good but not always obvious way to make the application faster is to make it do less work. Other than for debug purposes the line <span class="docutils literal"><span class="pre">log.Printf(&quot;%s</span> request took %v&quot;, name, elapsed)</span> doesn't need to be in the web service. All unnecessary logs should be removed or disabled from the code before deploying it to production. This problem can be solved using a leveled logger - there are <a class="reference external" href="https://github.com/avelino/awesome-go#logging" target="_blank">plenty</a> of logging libraries.</p>
<p>Another important thing about logging (and about all I/O operations in general) is to use buffered input/output when possible which can help reduce the number of system calls. Usually, there is no need to write to a file on every logger call - use <a class="reference external" href="https://golang.org/pkg/bufio/" target="_blank">bufio</a> package to implement buffered I/O. We can simply wrap the <span class="docutils literal">io.Writer</span> object that we pass to a logger with <span class="docutils literal">bufio.NewWriter</span> or <span class="docutils literal">bufio.NewWriterSize</span>:</p>
<pre class="code go literal-block"><code><span class="nx">log</span><span class="p">.</span><span class="nx">SetOutput</span><span class="p">(</span><span class="nx">bufio</span><span class="p">.</span><span class="nx">NewWriterSize</span><span class="p">(</span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="mi">1024</span><span class="o">*</span><span class="mi">16</span><span class="p">))</span></code></pre>
</section>
<section id="leftpad">
<h4>leftpad<a class="headerlink" href="#leftpad" title="Permalink to this headline"> #</a></h4>
<p>Take a look at the <span class="docutils literal">leftpad</span> function again:</p>
<pre class="code go literal-block"><code><span class="kd">func</span><span class="w"> </span><span class="nx">leftpad</span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">length</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="nx">char</span><span class="w"> </span><span class="kt">rune</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">length</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">char</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">s</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">s</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>Concatenating strings in a loop is not the smartest thing to do - every loop iteration allocates a new string. A better way to build a string is to use <a class="reference external" href="https://golang.org/pkg/bytes/#Buffer" target="_blank">bytes.Buffer</a>:</p>
<pre class="code go literal-block"><code><span class="kd">func</span><span class="w"> </span><span class="nx">leftpad</span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">length</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="nx">char</span><span class="w"> </span><span class="kt">rune</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">buf</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Buffer</span><span class="p">{}</span><span class="w">
    </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">length</span><span class="o">-</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">);</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">buf</span><span class="p">.</span><span class="nx">WriteRune</span><span class="p">(</span><span class="nx">char</span><span class="p">)</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="nx">buf</span><span class="p">.</span><span class="nx">WriteString</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">buf</span><span class="p">.</span><span class="nx">String</span><span class="p">()</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>Alternatively, we can use <a class="reference external" href="https://golang.org/pkg/strings/#Repeat" target="_blank">string.Repeat</a> which makes the code slightly shorter:</p>
<pre class="code go literal-block"><code><span class="kd">func</span><span class="w"> </span><span class="nx">leftpad</span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">length</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="nx">char</span><span class="w"> </span><span class="kt">rune</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">length</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="k">return</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Repeat</span><span class="p">(</span><span class="nb">string</span><span class="p">(</span><span class="nx">char</span><span class="p">),</span><span class="w"> </span><span class="nx">length</span><span class="o">-</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">s</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">s</span><span class="w">
</span><span class="p">}</span></code></pre>
</section>
<section id="statsd-client">
<h4>StatsD client<a class="headerlink" href="#statsd-client" title="Permalink to this headline"> #</a></h4>
<p>The next piece of code we need to change is <span class="docutils literal">StatsD.Send</span> function:</p>
<pre class="code go literal-block"><code><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">StatsD</span><span class="p">)</span><span class="w"> </span><span class="nx">Send</span><span class="p">(</span><span class="nx">stat</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">delta</span><span class="w"> </span><span class="kt">float64</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">buf</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;%s.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">Namespace</span><span class="p">)</span><span class="w">
    </span><span class="nx">trimmedStat</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">NewReplacer</span><span class="p">(</span><span class="s">&quot;:&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;_&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;|&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;_&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&#64;&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;_&quot;</span><span class="p">).</span><span class="nx">Replace</span><span class="p">(</span><span class="nx">stat</span><span class="p">)</span><span class="w">
    </span><span class="nx">buf</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;%s:%s|%s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">trimmedStat</span><span class="p">,</span><span class="w"> </span><span class="nx">delta</span><span class="p">,</span><span class="w"> </span><span class="nx">kind</span><span class="p">)</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">SampleRate</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">SampleRate</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">buf</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;|&#64;%s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">FormatFloat</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">SampleRate</span><span class="p">,</span><span class="w"> </span><span class="sc">'f'</span><span class="p">,</span><span class="w"> </span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">64</span><span class="p">))</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="nx">ioutil</span><span class="p">.</span><span class="nx">Discard</span><span class="p">.</span><span class="nx">Write</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">buf</span><span class="p">))</span><span class="w"> </span><span class="c1">// TODO: Write to a socket</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>Here are some possible improvements:</p>
<ul class="simple">
<li><p><span class="docutils literal">Sprintf</span> is convenient for string formatting, and it's perfectly fine unless you start calling it thousands of times per second. It spends CPU time to parse the input format string, and it allocates a new string on every call. We can replace it with <span class="docutils literal">bytes.Buffer</span> + <span class="docutils literal">Buffer.WriteString/Buffer.WriteByte</span>.</p></li>
<li><p>The function doesn't need to create a new <span class="docutils literal">Replacer</span> instance every time, it can be declared as a global variable or as a part of <span class="docutils literal">StatsD</span> structure.</p></li>
<li><p>Replace <span class="docutils literal">strconv.FormatFloat</span> with <span class="docutils literal">strconv.AppendFloat</span> and pass it a buffer allocated on stack to prevent additional heap allocations.</p></li>
</ul>
<pre class="code go literal-block"><code><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">StatsD</span><span class="p">)</span><span class="w"> </span><span class="nx">Send</span><span class="p">(</span><span class="nx">stat</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">delta</span><span class="w"> </span><span class="kt">float64</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">buf</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Buffer</span><span class="p">{}</span><span class="w">
    </span><span class="nx">buf</span><span class="p">.</span><span class="nx">WriteString</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">Namespace</span><span class="p">)</span><span class="w">
    </span><span class="nx">buf</span><span class="p">.</span><span class="nx">WriteByte</span><span class="p">(</span><span class="sc">'.'</span><span class="p">)</span><span class="w">
    </span><span class="nx">buf</span><span class="p">.</span><span class="nx">WriteString</span><span class="p">(</span><span class="nx">reservedReplacer</span><span class="p">.</span><span class="nx">Replace</span><span class="p">(</span><span class="nx">stat</span><span class="p">))</span><span class="w">
    </span><span class="nx">buf</span><span class="p">.</span><span class="nx">WriteByte</span><span class="p">(</span><span class="sc">':'</span><span class="p">)</span><span class="w">
    </span><span class="nx">buf</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">strconv</span><span class="p">.</span><span class="nx">AppendFloat</span><span class="p">(</span><span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">24</span><span class="p">),</span><span class="w"> </span><span class="nx">delta</span><span class="p">,</span><span class="w"> </span><span class="sc">'f'</span><span class="p">,</span><span class="w"> </span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">64</span><span class="p">))</span><span class="w">
    </span><span class="nx">buf</span><span class="p">.</span><span class="nx">WriteByte</span><span class="p">(</span><span class="sc">'|'</span><span class="p">)</span><span class="w">
    </span><span class="nx">buf</span><span class="p">.</span><span class="nx">WriteString</span><span class="p">(</span><span class="nx">kind</span><span class="p">)</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">SampleRate</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">SampleRate</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nx">buf</span><span class="p">.</span><span class="nx">WriteString</span><span class="p">(</span><span class="s">&quot;|&#64;&quot;</span><span class="p">)</span><span class="w">
        </span><span class="nx">buf</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">strconv</span><span class="p">.</span><span class="nx">AppendFloat</span><span class="p">(</span><span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">24</span><span class="p">),</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">SampleRate</span><span class="p">,</span><span class="w"> </span><span class="sc">'f'</span><span class="p">,</span><span class="w"> </span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">64</span><span class="p">))</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="nx">buf</span><span class="p">.</span><span class="nx">WriteTo</span><span class="p">(</span><span class="nx">ioutil</span><span class="p">.</span><span class="nx">Discard</span><span class="p">)</span><span class="w"> </span><span class="c1">// TODO: Write to a socket</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>That reduces the number of allocations from 14 to 1 and makes <span class="docutils literal">Send</span> run about 4x faster:</p>
<pre class="code literal-block"><code>BenchmarkStatsD-4                5000000           381 ns/op         112 B/op          1 allocs/op</code></pre>
</section>
</section>
<section id="measuring-the-result">
<h3>Measuring the result<a class="headerlink" href="#measuring-the-result" title="Permalink to this headline"> #</a></h3>
<p>The benchmarks show a very nice performance boost after all optimizations:</p>
<pre class="code literal-block"><code>benchmark                     old ns/op     new ns/op     delta
BenchmarkTimedHandler-4       6511          1181          -81.86%
BenchmarkLeftpadHandler-4     10546         3337          -68.36%
BenchmarkLeftpad10-4          339           136           -59.88%
BenchmarkLeftpad50-4          3079          201           -93.47%
BenchmarkStatsD-4             1516          381           -74.87%

benchmark                     old allocs     new allocs     delta
BenchmarkTimedHandler-4       41             5              -87.80%
BenchmarkLeftpadHandler-4     75             18             -76.00%
BenchmarkLeftpad10-4          6              3              -50.00%
BenchmarkLeftpad50-4          46             3              -93.48%
BenchmarkStatsD-4             15             1              -93.33%

benchmark                     old bytes     new bytes     delta
BenchmarkTimedHandler-4       1621          448           -72.36%
BenchmarkLeftpadHandler-4     3297          1416          -57.05%
BenchmarkLeftpad10-4          64            24            -62.50%
BenchmarkLeftpad50-4          1568          160           -89.80%
BenchmarkStatsD-4             560           112           -80.00%</code></pre>
<aside class="admonition note">
<p class="admonition-title">Note</p>
<p>I used <a class="reference external" href="http://golang.org/x/perf/benchstat" target="_blank">benchstat</a> to compare the results.</p>
</aside>
<p>Run <span class="docutils literal">ab</span> again:</p>
<pre class="code literal-block"><code>Requests per second:    32619.54 [#/sec] (mean)
Time per request:       0.030 [ms] (mean, across all concurrent requests)</code></pre>
<p>The web service can handle about 10000 additional requests per second now!</p>
</section>
<section id="optimization-tips">
<h3>Optimization tips<a class="headerlink" href="#optimization-tips" title="Permalink to this headline"> #</a></h3>
<ul class="simple">
<li><p>Avoid unnecessary heap allocations.</p></li>
<li><p>Prefer values over pointers for not big structures.</p></li>
<li><p>Preallocate maps and slices if you know the size beforehand.</p></li>
<li><p>Don't log if you don't have to.</p></li>
<li><p>Use buffered I/O if you do many sequential reads or writes.</p></li>
<li><p>If your application extensively uses JSON, consider utilizing parser/serializer generators (I personally prefer <a class="reference external" href="https://github.com/mailru/easyjson" target="_blank">easyjson</a>).</p></li>
<li><p>Every operation matters in a hot path.</p></li>
</ul>
</section>
<section id="conclusion">
<h3>Conclusion<a class="headerlink" href="#conclusion" title="Permalink to this headline"> #</a></h3>
<p>Sometimes the bottleneck may be not what you are expecting - profiling is the best and sometimes the only way to understand the real performance of your application.</p>
<p>You can find the full source code on <a class="reference external" href="https://github.com/akrylysov/goprofex" target="_blank">GitHub</a>. The initial version is tagged as <a class="reference external" href="https://github.com/akrylysov/goprofex/tree/v1" target="_blank">v1</a> and the optimized version is tagged as <a class="reference external" href="https://github.com/akrylysov/goprofex/tree/v2" target="_blank">v2</a>. Here is <a class="reference external" href="https://github.com/akrylysov/goprofex/compare/v1...v2" target="_blank">the link</a> to compare these two versions.</p>
</section>
]]>
        </content>
    </entry>
    
    <entry>
        <title><![CDATA[Scraping the Web with AWS Lambda and PhantomJS]]></title>
        <link href="https://artem.krylysov.com/blog/2016/06/22/scraping-the-web-with-aws-lambda-and-phantomjs/"/>
        <updated>2016-06-22T00:00:00Z</updated>
        <id>https://artem.krylysov.com/blog/2016/06/22/scraping-the-web-with-aws-lambda-and-phantomjs/</id>
        <content type="html">
            <![CDATA[<p>Here are the slides from my talk &quot;Scraping the Web with AWS Lambda and PhantomJS&quot; given at Greater Philadelphia AWS User Group meetup on May 25, 2016.</p>
<div class="align-left media"><script async class="speakerdeck-embed" data-id="e8ea6d7ca6614277a5d1f9ccb4d52b2c" data-ratio="1.77777777777778" src="//speakerdeck.com/assets/embed.js"></script></div><p>You can find the source code of PhantomJS/Node.js web scraper for AWS Lambda at <a class="reference external" href="https://github.com/akrylysov/lambda-phantom-scraper" target="_blank">https://github.com/akrylysov/lambda-phantom-scraper</a>.</p>
]]>
        </content>
    </entry>
    
    <entry>
        <title><![CDATA[Benchmark of Python JSON libraries]]></title>
        <link href="https://artem.krylysov.com/blog/2015/09/29/benchmark-python-json-libraries/"/>
        <updated>2015-09-29T00:00:00Z</updated>
        <id>https://artem.krylysov.com/blog/2015/09/29/benchmark-python-json-libraries/</id>
        <content type="html">
            <![CDATA[<aside class="admonition note">
<p class="admonition-title">Note</p>
<p>This post was updated on 2016-08-13: added <span class="docutils literal"><span class="pre">python-rapidjson</span></span>; updated <span class="docutils literal">simplejson</span> and <span class="docutils literal">ujson</span>.</p>
</aside>
<p>A couple of weeks ago after spending some time with Python profiler, I discovered that Python’s <span class="docutils literal">json</span> module is not as fast as I expected. I decided to benchmark alternative JSON libraries.</p>
<section id="libraries">
<h3>Libraries<a class="headerlink" href="#libraries" title="Permalink to this headline"> #</a></h3>
<ul class="simple">
<li><p><a class="reference external" href="https://docs.python.org/3/library/json.html" target="_blank">json</a></p></li>
<li><p><a class="reference external" href="https://pypi.python.org/pypi/simplejson" target="_blank">simplejson</a> 3.8.2</p></li>
<li><p><a class="reference external" href="https://pypi.python.org/pypi/ujson" target="_blank">ujson</a> 1.35</p></li>
<li><p><a class="reference external" href="https://pypi.python.org/pypi/python-rapidjson" target="_blank">python-rapidjson</a> 0.0.6</p></li>
</ul>
<p><span class="docutils literal"><span class="pre">python-cjson</span></span>, <span class="docutils literal"><span class="pre">yajl-py</span></span> and <span class="docutils literal">jsonlib</span> are not included in the benchmark, they are not in active development and don’t support Python 3.</p>
<p><span class="docutils literal">simplejson</span> and <span class="docutils literal">ujson</span> may be used as a drop-in replacement for the standard <span class="docutils literal">json</span> module, but <span class="docutils literal">ujson</span> doesn’t support advanced features like hooks, custom encoders and decoders.</p>
<p>You can change your imports this way to use an alternative library:</p>
<pre class="code python literal-block"><code><span class="kn">import</span> <span class="nn">ujson</span> <span class="k">as</span> <span class="nn">json</span></code></pre>
</section>
<section id="interpreters">
<h3>Interpreters<a class="headerlink" href="#interpreters" title="Permalink to this headline"> #</a></h3>
<ul class="simple">
<li><p>Python (CPython) 2.7.12</p></li>
<li><p>Python (CPython) 3.5.2</p></li>
<li><p><a class="reference external" href="http://pypy.org/" target="_blank">PyPy</a> 5.3.0</p></li>
</ul>
<aside class="admonition note">
<p class="admonition-title">Note</p>
<p>ujson is not compatible with PyPy; python-rapidjson is compatible only with Python 3.</p>
</aside>
</section>
<section id="methodology">
<h3>Methodology<a class="headerlink" href="#methodology" title="Permalink to this headline"> #</a></h3>
<p>The tests were performed on MacBook Pro Late 2013 (2.6 GHz Intel Core i5, 8 GB 1600 MHz DDR3, Mac OS 10.11.6). Every test runs 100 times.</p>
<table>
<caption>Test data</caption>
<colgroup>
<col style="width: 36.4%" />
<col style="width: 18.2%" />
<col style="width: 45.5%" />
</colgroup>
<thead>
<tr><th class="head"><p>File name</p></th>
<th class="head"><p>File size</p></th>
<th class="head"><p>Description</p></th>
</tr>
</thead>
<tbody>
<tr><td><p><span class="docutils literal">twitter.json</span></p></td>
<td><p>632 KB</p></td>
<td><p>Single large JSON (<a class="reference external" href="https://github.com/akrylysov/python-json-benchmark/blob/master/data/twitter.json" target="_blank">source</a>)</p></td>
</tr>
<tr><td><p><span class="docutils literal"><span class="pre">one-json-per-line.jsons.txt</span></span></p></td>
<td><p>176 KB</p></td>
<td><p>Collection of 1000 JSON objects (<a class="reference external" href="https://github.com/akrylysov/python-json-benchmark/blob/master/data/one-json-per-line.txt" target="_blank">source</a>)</p></td>
</tr>
</tbody>
</table>
<p>I published <a class="reference external" href="https://github.com/akrylysov/python-json-benchmark" target="_blank">the source code</a> of the benchmark on GitHub. You can clone it and rerun if you want to check it by yourself or if a new version of an alternative JSON library is released.</p>
</section>
<section id="results">
<h3>Results<a class="headerlink" href="#results" title="Permalink to this headline"> #</a></h3>
<table>
<caption>Python 2.7</caption>
<colgroup>
<col style="width: 40.0%" />
<col style="width: 20.0%" />
<col style="width: 20.0%" />
<col style="width: 20.0%" />
</colgroup>
<thead>
<tr><th class="head"></th>
<th class="head"><p>json</p></th>
<th class="head"><p>simplejson</p></th>
<th class="head"><p>ujson</p></th>
</tr>
</thead>
<tbody>
<tr><td><p><em>loads (large obj)</em></p></td>
<td><p>1.140</p></td>
<td><p>0.441</p></td>
<td><p>0.448</p></td>
</tr>
<tr><td><p><em>dumps (large obj)</em></p></td>
<td><p>0.564</p></td>
<td><p>0.630</p></td>
<td><p>0.459</p></td>
</tr>
<tr><td><p><em>loads (small objs)</em></p></td>
<td><p>1.190</p></td>
<td><p>0.579</p></td>
<td><p>0.195</p></td>
</tr>
<tr><td><p><em>dumps (small objs)</em></p></td>
<td><p>0.910</p></td>
<td><p>1.641</p></td>
<td><p>0.304</p></td>
</tr>
</tbody>
</table>
<img alt="" src="https://artem.krylysov.com/images/2015-benchmark-python-json/benchmark-json-python2.png" style="width: 695px; height: 542px;" />
<table>
<caption>Python 3.5</caption>
<colgroup>
<col style="width: 33.3%" />
<col style="width: 16.7%" />
<col style="width: 16.7%" />
<col style="width: 16.7%" />
<col style="width: 16.7%" />
</colgroup>
<thead>
<tr><th class="head"></th>
<th class="head"><p>json</p></th>
<th class="head"><p>simplejson</p></th>
<th class="head"><p>ujson</p></th>
<th class="head"><p>rapidjson</p></th>
</tr>
</thead>
<tbody>
<tr><td><p><em>loads (large obj)</em></p></td>
<td><p>0.600</p></td>
<td><p>0.698</p></td>
<td><p>0.605</p></td>
<td><p>0.634</p></td>
</tr>
<tr><td><p><em>dumps (large obj)</em></p></td>
<td><p>0.673</p></td>
<td><p>0.629</p></td>
<td><p>0.381</p></td>
<td><p>0.365</p></td>
</tr>
<tr><td><p><em>loads (small objs)</em></p></td>
<td><p>0.801</p></td>
<td><p>1.091</p></td>
<td><p>0.322</p></td>
<td><p>0.531</p></td>
</tr>
<tr><td><p><em>dumps (small objs)</em></p></td>
<td><p>1.213</p></td>
<td><p>2.038</p></td>
<td><p>0.285</p></td>
<td><p>0.234</p></td>
</tr>
</tbody>
</table>
<img alt="" src="https://artem.krylysov.com/images/2015-benchmark-python-json/benchmark-json-python3.png" style="width: 695px; height: 542px;" />
<table>
<caption>PyPy 5.3</caption>
<colgroup>
<col style="width: 50.0%" />
<col style="width: 25.0%" />
<col style="width: 25.0%" />
</colgroup>
<thead>
<tr><th class="head"></th>
<th class="head"><p>json</p></th>
<th class="head"><p>simplejson</p></th>
</tr>
</thead>
<tbody>
<tr><td><p><em>loads (large obj)</em></p></td>
<td><p>0.545</p></td>
<td><p>1.876</p></td>
</tr>
<tr><td><p><em>dumps (large obj)</em></p></td>
<td><p>0.632</p></td>
<td><p>3.974</p></td>
</tr>
<tr><td><p><em>loads (small objs)</em></p></td>
<td><p>0.271</p></td>
<td><p>1.651</p></td>
</tr>
<tr><td><p><em>dumps (small objs)</em></p></td>
<td><p>0.719</p></td>
<td><p>2.404</p></td>
</tr>
</tbody>
</table>
<img alt="" src="https://artem.krylysov.com/images/2015-benchmark-python-json/benchmark-json-pypy.png" style="width: 695px; height: 542px;" />
<aside class="admonition note">
<p class="admonition-title">Note</p>
<p>Results are in seconds.</p>
</aside>
</section>
<section id="conclusion">
<h3>Conclusion<a class="headerlink" href="#conclusion" title="Permalink to this headline"> #</a></h3>
<p>The numbers speak for themselves. If your application is dealing with a big amount of JSON data and doesn't use any advanced features of built-in <span class="docutils literal">json</span> module, you should probably consider switching to <span class="docutils literal">ujson</span>.</p>
</section>
]]>
        </content>
    </entry>
    
    <entry>
        <title><![CDATA[Производительность С++ STL regex]]></title>
        <link href="https://artem.krylysov.com/blog/2014/10/06/cpp-stl-regex-performance/"/>
        <updated>2014-10-06T00:00:00Z</updated>
        <id>https://artem.krylysov.com/blog/2014/10/06/cpp-stl-regex-performance/</id>
        <content type="html">
            <![CDATA[<p>Столкнулся недавно с простой задачей - нужно было найти позицию открывающегося тега <span class="docutils literal">&lt;body&gt;</span> в HTML странице. Не долго думая я решил использовать регулярные выражения, через минуту у меня родился регексп <span class="docutils literal"><span class="pre">&lt;body[^&gt;]*&gt;</span></span>. Все работало хорошо, пока дело не дошло до тестирования на больших объемах данных.</p>
<p>Я решил создать тестовое приложение, дабы замерить скорость работы <span class="docutils literal">regex_search</span>:</p>
<pre class="code c++ literal-block"><code><span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;regex&gt;</span><span class="cp">
#include</span><span class="w"> </span><span class="cpf">&lt;iostream&gt;</span><span class="cp">
</span><span class="w">
</span><span class="kt">int</span><span class="w"> </span><span class="nf">find_position_re</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="o">&amp;</span><span class="n">text</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">regex</span><span class="w"> </span><span class="o">&amp;</span><span class="n">regex</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
    </span><span class="n">std</span><span class="o">::</span><span class="n">smatch</span><span class="w"> </span><span class="n">match</span><span class="p">;</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">regex_search</span><span class="p">(</span><span class="n">text</span><span class="p">,</span><span class="w"> </span><span class="n">match</span><span class="p">,</span><span class="w"> </span><span class="n">regex</span><span class="p">))</span><span class="w">
    </span><span class="p">{</span><span class="w">
        </span><span class="k">return</span><span class="w"> </span><span class="n">match</span><span class="p">.</span><span class="n">position</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">match</span><span class="p">.</span><span class="n">length</span><span class="p">();</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="mi">-1</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">argv</span><span class="p">[])</span><span class="w">
</span><span class="p">{</span><span class="w">
    </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">text</span><span class="p">(</span><span class="mi">1024</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">10000</span><span class="p">,</span><span class="w"> </span><span class="sc">' '</span><span class="p">);</span><span class="w">
    </span><span class="n">text</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="s">&quot;&lt;body&gt;asdasd&lt;/body&gt;&quot;</span><span class="p">);</span><span class="w">
    </span><span class="n">std</span><span class="o">::</span><span class="n">regex</span><span class="w"> </span><span class="n">regex</span><span class="p">(</span><span class="s">&quot;&lt;body[^&gt;]*&gt;&quot;</span><span class="p">);</span><span class="w">

    </span><span class="kt">float</span><span class="w"> </span><span class="n">start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">float</span><span class="p">)</span><span class="n">clock</span><span class="p">()</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">CLOCKS_PER_SEC</span><span class="p">;</span><span class="w">

    </span><span class="n">find_position_re</span><span class="p">(</span><span class="n">text</span><span class="p">,</span><span class="w"> </span><span class="n">regex</span><span class="p">);</span><span class="w">

    </span><span class="kt">float</span><span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">float</span><span class="p">)</span><span class="n">clock</span><span class="p">()</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">CLOCKS_PER_SEC</span><span class="p">;</span><span class="w">

    </span><span class="n">std</span><span class="o">::</span><span class="n">cout</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">start</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>На поиск в 10 мегабайтной строке потребовалось почти 200 миллисекунд, что не очень быстро, учитывая, что тестировалось на свежем MacBook Pro.</p>
<p>Для интереса я скомпилировал код с помощью GCC и Clang, написал аналогичный код на Python:</p>
<pre class="code python literal-block"><code><span class="kn">import</span> <span class="nn">re</span><span class="w">
</span><span class="kn">import</span> <span class="nn">timeit</span><span class="w">

</span><span class="k">def</span> <span class="nf">find_position_re</span><span class="p">(</span><span class="n">text</span><span class="p">,</span> <span class="n">regex</span><span class="p">):</span><span class="w">
</span>    <span class="n">match</span> <span class="o">=</span> <span class="n">regex</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="n">text</span><span class="p">)</span><span class="w">
</span>    <span class="k">if</span> <span class="n">match</span><span class="p">:</span><span class="w">
</span>        <span class="k">return</span> <span class="n">match</span><span class="o">.</span><span class="n">start</span><span class="p">()</span><span class="w">
</span>    <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="w">

</span><span class="n">text</span> <span class="o">=</span> <span class="s1">' '</span> <span class="o">*</span> <span class="p">(</span><span class="mi">1024</span> <span class="o">*</span> <span class="mi">10000</span><span class="p">)</span><span class="w">
</span><span class="n">text</span> <span class="o">+=</span> <span class="s1">'&lt;body&gt;asdasd&lt;/body&gt;'</span><span class="w">
</span><span class="n">regex</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">&quot;&lt;body[^&gt;]*&gt;&quot;</span><span class="p">)</span><span class="w">

</span><span class="nb">print</span><span class="p">(</span><span class="n">timeit</span><span class="o">.</span><span class="n">timeit</span><span class="p">(</span><span class="k">lambda</span><span class="p">:</span> <span class="n">find_position_re</span><span class="p">(</span><span class="n">text</span><span class="p">,</span> <span class="n">regex</span><span class="p">),</span> <span class="n">number</span><span class="o">=</span><span class="mi">1</span><span class="p">))</span></code></pre>
<p>И на JavaScript:</p>
<pre class="code javascript literal-block"><code><span class="kd">function</span><span class="w"> </span><span class="nx">find_position_re</span><span class="p">(</span><span class="nx">text</span><span class="p">,</span><span class="w"> </span><span class="nx">regex</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nx">text</span><span class="p">.</span><span class="nx">search</span><span class="p">(</span><span class="nx">regex</span><span class="p">);</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="kd">var</span><span class="w"> </span><span class="nx">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">var</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mf">1024</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">10000</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nx">text</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s1">' '</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="nx">text</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s1">'&lt;body&gt;asdasd&lt;/body&gt;'</span><span class="p">;</span><span class="w">

</span><span class="kd">var</span><span class="w"> </span><span class="nx">regex</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nb">RegExp</span><span class="p">(</span><span class="s1">'&lt;body[^&gt;]*&gt;'</span><span class="p">);</span><span class="w">

</span><span class="nx">console</span><span class="p">.</span><span class="nx">time</span><span class="p">(</span><span class="s1">'test'</span><span class="p">);</span><span class="w">

</span><span class="nx">find_position_re</span><span class="p">(</span><span class="nx">text</span><span class="p">,</span><span class="w"> </span><span class="nx">regex</span><span class="p">);</span><span class="w">

</span><span class="nx">console</span><span class="p">.</span><span class="nx">timeEnd</span><span class="p">(</span><span class="s1">'test'</span><span class="p">);</span></code></pre>
<p>Тестовое приложение на Node.js оказалось в 3, а на Python в 31 раз быстрее, чем на Visual Studio 2013.</p>
<img alt="" src="https://artem.krylysov.com/images/cpp-regex/regex_test.png" style="width: 585px; height: 331px;" />
<p>Clang меня абсолютно разочаровал результатом. Немного покрутив флаги компилятора, я понял, что дело совсем не в них. Я запустил профайлер, чтобы найти узкое место. Им оказалась функция <span class="docutils literal">__match_at_start_ecma</span>, которая вызывалась на каждую итерацию поиска:</p>
<pre class="code c++ literal-block"><code><span class="k">template</span><span class="w"> </span><span class="o">&lt;</span><span class="k">class</span><span class="w"> </span><span class="nc">_CharT</span><span class="p">,</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="nc">_Traits</span><span class="o">&gt;</span><span class="w">
</span><span class="k">template</span><span class="w"> </span><span class="o">&lt;</span><span class="k">class</span><span class="w"> </span><span class="nc">_Allocator</span><span class="o">&gt;</span><span class="w">
</span><span class="kt">bool</span><span class="w">
</span><span class="n">basic_regex</span><span class="o">&lt;</span><span class="n">_CharT</span><span class="p">,</span><span class="w"> </span><span class="n">_Traits</span><span class="o">&gt;::</span><span class="n">__match_at_start_ecma</span><span class="p">(</span><span class="w">
        </span><span class="k">const</span><span class="w"> </span><span class="n">_CharT</span><span class="o">*</span><span class="w"> </span><span class="n">__first</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">_CharT</span><span class="o">*</span><span class="w"> </span><span class="n">__last</span><span class="p">,</span><span class="w">
        </span><span class="n">match_results</span><span class="o">&lt;</span><span class="k">const</span><span class="w"> </span><span class="n">_CharT</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">_Allocator</span><span class="o">&gt;&amp;</span><span class="w"> </span><span class="n">__m</span><span class="p">,</span><span class="w">
        </span><span class="n">regex_constants</span><span class="o">::</span><span class="n">match_flag_type</span><span class="w"> </span><span class="n">__flags</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">__at_first</span><span class="p">)</span><span class="w"> </span><span class="k">const</span><span class="w">
</span><span class="p">{</span><span class="w">
    </span><span class="n">vector</span><span class="o">&lt;</span><span class="n">__state</span><span class="o">&gt;</span><span class="w"> </span><span class="n">__states</span><span class="p">;</span><span class="w">
    </span><span class="n">__node</span><span class="o">*</span><span class="w"> </span><span class="n">__st</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">__start_</span><span class="p">.</span><span class="n">get</span><span class="p">();</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">__st</span><span class="p">)</span><span class="w">
    </span><span class="p">{</span><span class="w">
        </span><span class="n">__states</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">__state</span><span class="p">());</span><span class="w">
        </span><span class="n">__states</span><span class="p">.</span><span class="n">back</span><span class="p">().</span><span class="n">__do_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w">
        </span><span class="n">__states</span><span class="p">.</span><span class="n">back</span><span class="p">().</span><span class="n">__first_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">__first</span><span class="p">;</span><span class="w">
        </span><span class="n">__states</span><span class="p">.</span><span class="n">back</span><span class="p">().</span><span class="n">__current_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">__first</span><span class="p">;</span><span class="w">
        </span><span class="n">__states</span><span class="p">.</span><span class="n">back</span><span class="p">().</span><span class="n">__last_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">__last</span><span class="p">;</span><span class="w">
        </span><span class="n">__states</span><span class="p">.</span><span class="n">back</span><span class="p">().</span><span class="n">__sub_matches_</span><span class="p">.</span><span class="n">resize</span><span class="p">(</span><span class="n">mark_count</span><span class="p">());</span><span class="w">
        </span><span class="n">__states</span><span class="p">.</span><span class="n">back</span><span class="p">().</span><span class="n">__loop_data_</span><span class="p">.</span><span class="n">resize</span><span class="p">(</span><span class="n">__loop_count</span><span class="p">());</span><span class="w">
        </span><span class="n">__states</span><span class="p">.</span><span class="n">back</span><span class="p">().</span><span class="n">__node_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">__st</span><span class="p">;</span><span class="w">
        </span><span class="n">__states</span><span class="p">.</span><span class="n">back</span><span class="p">().</span><span class="n">__flags_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">__flags</span><span class="p">;</span><span class="w">
        </span><span class="n">__states</span><span class="p">.</span><span class="n">back</span><span class="p">().</span><span class="n">__at_first_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">__at_first</span><span class="p">;</span><span class="w">
        </span><span class="k">do</span><span class="w">
        </span><span class="p">{</span><span class="w">
            </span><span class="n">__state</span><span class="o">&amp;</span><span class="w"> </span><span class="n">__s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">__states</span><span class="p">.</span><span class="n">back</span><span class="p">();</span><span class="w">
            </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">__s</span><span class="p">.</span><span class="n">__node_</span><span class="p">)</span><span class="w">
                </span><span class="n">__s</span><span class="p">.</span><span class="n">__node_</span><span class="o">-&gt;</span><span class="n">__exec</span><span class="p">(</span><span class="n">__s</span><span class="p">);</span><span class="w">
            </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">__s</span><span class="p">.</span><span class="n">__do_</span><span class="p">)</span><span class="w">
            </span><span class="p">{</span><span class="w">
            </span><span class="k">case</span><span class="w"> </span><span class="no">__state</span><span class="o">::</span><span class="no">__end_state</span><span class="p">:</span><span class="w">
                </span><span class="n">__m</span><span class="p">.</span><span class="n">__matches_</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">first</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">__first</span><span class="p">;</span><span class="w">
                </span><span class="n">__m</span><span class="p">.</span><span class="n">__matches_</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">second</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_VSTD</span><span class="o">::</span><span class="n">next</span><span class="p">(</span><span class="n">__first</span><span class="p">,</span><span class="w"> </span><span class="n">__s</span><span class="p">.</span><span class="n">__current_</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">__first</span><span class="p">);</span><span class="w">
                </span><span class="n">__m</span><span class="p">.</span><span class="n">__matches_</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">matched</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">true</span><span class="p">;</span><span class="w">
                </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kt">unsigned</span><span class="w"> </span><span class="n">__i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">__i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">__s</span><span class="p">.</span><span class="n">__sub_matches_</span><span class="p">.</span><span class="n">size</span><span class="p">();</span><span class="w"> </span><span class="o">++</span><span class="n">__i</span><span class="p">)</span><span class="w">
                    </span><span class="n">__m</span><span class="p">.</span><span class="n">__matches_</span><span class="p">[</span><span class="n">__i</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">__s</span><span class="p">.</span><span class="n">__sub_matches_</span><span class="p">[</span><span class="n">__i</span><span class="p">];</span><span class="w">
                </span><span class="k">return</span><span class="w"> </span><span class="nb">true</span><span class="p">;</span><span class="w">
            </span><span class="k">case</span><span class="w"> </span><span class="no">__state</span><span class="o">::</span><span class="no">__accept_and_consume</span><span class="p">:</span><span class="w">
            </span><span class="k">case</span><span class="w"> </span><span class="no">__state</span><span class="o">::</span><span class="no">__repeat</span><span class="p">:</span><span class="w">
            </span><span class="k">case</span><span class="w"> </span><span class="no">__state</span><span class="o">::</span><span class="no">__accept_but_not_consume</span><span class="p">:</span><span class="w">
                </span><span class="k">break</span><span class="p">;</span><span class="w">
            </span><span class="k">case</span><span class="w"> </span><span class="no">__state</span><span class="o">::</span><span class="no">__split</span><span class="p">:</span><span class="w">
                </span><span class="p">{</span><span class="w">
                </span><span class="n">__state</span><span class="w"> </span><span class="n">__snext</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">__s</span><span class="p">;</span><span class="w">
                </span><span class="n">__s</span><span class="p">.</span><span class="n">__node_</span><span class="o">-&gt;</span><span class="n">__exec_split</span><span class="p">(</span><span class="nb">true</span><span class="p">,</span><span class="w"> </span><span class="n">__s</span><span class="p">);</span><span class="w">
                </span><span class="n">__snext</span><span class="p">.</span><span class="n">__node_</span><span class="o">-&gt;</span><span class="n">__exec_split</span><span class="p">(</span><span class="nb">false</span><span class="p">,</span><span class="w"> </span><span class="n">__snext</span><span class="p">);</span><span class="w">
                </span><span class="n">__states</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">_VSTD</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">__snext</span><span class="p">));</span><span class="w">
                </span><span class="p">}</span><span class="w">
                </span><span class="k">break</span><span class="p">;</span><span class="w">
            </span><span class="k">case</span><span class="w"> </span><span class="no">__state</span><span class="o">::</span><span class="no">__reject</span><span class="p">:</span><span class="w">
                </span><span class="n">__states</span><span class="p">.</span><span class="n">pop_back</span><span class="p">();</span><span class="w">
                </span><span class="k">break</span><span class="p">;</span><span class="w">
            </span><span class="k">default</span><span class="o">:</span><span class="w">
</span><span class="cp">#ifndef _LIBCPP_NO_EXCEPTIONS
</span><span class="w">                </span><span class="k">throw</span><span class="w"> </span><span class="n">regex_error</span><span class="p">(</span><span class="n">regex_constants</span><span class="o">::</span><span class="n">__re_err_unknown</span><span class="p">);</span><span class="w">
</span><span class="cp">#endif
</span><span class="w">                </span><span class="k">break</span><span class="p">;</span><span class="w">

            </span><span class="p">}</span><span class="w">
        </span><span class="p">}</span><span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">__states</span><span class="p">.</span><span class="n">empty</span><span class="p">());</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>При длине строки в тысячу символов на моих тестовых данных эта функция вызывалась тысячу раз, что влекло за собой тысячу созданий, push_back, pop_back и уничтожений <span class="docutils literal"><span class="pre">std::vector</span></span> и соответственно тысячу динамических выделений и освобождений памяти.</p>
<img alt="" src="https://artem.krylysov.com/images/cpp-regex/profiler.png" style="width: 800px; height: 305px;" />
<p>В моем конкретном случае для моего регулярного выражения проблему можно решить таким костылем:</p>
<pre class="code c++ literal-block"><code><span class="kt">int</span><span class="w"> </span><span class="nf">find_position_re</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="o">&amp;</span><span class="n">text</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">regex</span><span class="w"> </span><span class="o">&amp;</span><span class="n">regex</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
    </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">text2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">text</span><span class="p">.</span><span class="n">substr</span><span class="p">(</span><span class="n">text</span><span class="p">.</span><span class="n">find</span><span class="p">(</span><span class="s">&quot;&lt;body&quot;</span><span class="p">));</span><span class="w">
    </span><span class="n">std</span><span class="o">::</span><span class="n">smatch</span><span class="w"> </span><span class="n">match</span><span class="p">;</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">regex_search</span><span class="p">(</span><span class="n">text2</span><span class="p">,</span><span class="w"> </span><span class="n">match</span><span class="p">,</span><span class="w"> </span><span class="n">regex</span><span class="p">))</span><span class="w">
    </span><span class="p">{</span><span class="w">
        </span><span class="k">return</span><span class="w"> </span><span class="n">match</span><span class="p">.</span><span class="n">position</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">match</span><span class="p">.</span><span class="n">length</span><span class="p">();</span><span class="w">
    </span><span class="p">}</span><span class="w">
    </span><span class="k">return</span><span class="w"> </span><span class="mi">-1</span><span class="p">;</span><span class="w">
</span><span class="p">}</span></code></pre>
<p>Это ускоряет работу примерно в 100 раз. Интересно, что аналогичный трюк в Python и Node.js влияет на скорость негативно.</p>
<p>Поддержка регулярных выражений в STL была добавлена в С++11 и реализована только в последних версиях компиляторов. Остается надеяться, что проблему производительности исправят в новых версиях.</p>
<p>Какой из этого всего можно сделать вывод? Производительность в первую очередь это не низкоуровневые (или новомодные) языки программирования и быстрые процессоры, а эффективные алгоритмы.</p>
]]>
        </content>
    </entry>
    
    <entry>
        <title><![CDATA[Автоматизация процесса разработки браузерных расширений]]></title>
        <link href="https://artem.krylysov.com/blog/2014/07/21/automating-browser-extensions-development-workflow/"/>
        <updated>2014-07-21T00:00:00Z</updated>
        <id>https://artem.krylysov.com/blog/2014/07/21/automating-browser-extensions-development-workflow/</id>
        <content type="html">
            <![CDATA[<p>Всем, кто хотя бы раз сталкивался с разработкой браузерных расширений известно, что это настоящий геморрой.</p>
<p>Проблему разработки и сборки расширений под все популярные браузеры в большинстве случаев можно решить с помощью, например, <a class="reference external" href="http://kangoextensions.com/" target="_blank">Kango framework</a> (кто не знает, Kango позволяет собирать расширения под Chrome, Firefox, Safari и Internet Explorer используя общий JavaScript код).</p>
<p>Информации о том, как лучшим образом настроить среду разработки браузерных расширений очень мало, поэтому хочу поделиться своим опытом.</p>
<p>Процесс разработки расширения обычно представляет собой:</p>
<ol class="arabic simple">
<li><p>Отредактировать исходный код.</p></li>
<li><p>Перейти в терминал и пересобрать расширение.</p></li>
<li><p>Переключиться в браузер и переустановить/перезагрузить расширение.</p></li>
</ol>
<img alt="" src="https://artem.krylysov.com/images/extension-dev-env/workflow.png" style="width: 663px;" />
<p>Процесс можно сократить всего до одного шага - пересобрать и переустановить расширение в браузере одним кликом не выходя из IDE.</p>
<p>Лучшего workflow удалось добиться используя Firefox.</p>
<section id="section-2">
<h3>Автоматическая перезагрузка расширения при пересборке<a class="headerlink" href="#section-2" title="Permalink to this headline"> #</a></h3>
<section id="firefox">
<h4>Firefox<a class="headerlink" href="#firefox" title="Permalink to this headline"> #</a></h4>
<p>В Firefox нет встроенного средства для автоматической установки расширений, но это легко решается с помощью <a class="reference external" href="https://addons.mozilla.org/en-US/firefox/addon/autoinstaller/" target="_blank">Extension Auto-Installer</a> от <a class="reference external" href="https://github.com/palant" target="_blank">Wladimir Palant</a>.</p>
<p>Чтобы установить расширение с помощью Extension Auto-Installer нужно послать XPI по HTTP по адресу <a class="reference external" href="http://localhost:8888/" target="_blank">http://localhost:8888/</a> (этот локальный веб-сервер поднимает Extension Auto-Installer). Сделать это можно, например, с помощью <a class="reference external" href="https://www.gnu.org/software/wget/" target="_blank">Wget</a> (в ответ должно прийти <span class="docutils literal">500 No Content</span>):</p>
<pre class="code sh literal-block"><code>wget<span class="w"> </span>--post-file<span class="o">=</span>extension.xpi<span class="w"> </span>http://localhost:8888/</code></pre>
<p>Полный shell скрипт для сборки и перезагрузки расширения под Mac OS и Linux:</p>
<pre class="code sh literal-block"><code><span class="ch">#!/bin/sh
</span><span class="nv">KANGODIR</span><span class="o">=</span><span class="s2">&quot;../../framework/&quot;</span><span class="w">
</span>python<span class="w"> </span><span class="nv">$KANGODIR</span>/kango.py<span class="w"> </span>build<span class="w"> </span>./<span class="w">
</span>wget<span class="w"> </span>--post-file<span class="o">=</span>output/extension_name_1.0.0.xpi<span class="w"> </span>http://localhost:8888/</code></pre>
<p>Под Windows:</p>
<pre class="code sh literal-block"><code>SET<span class="w"> </span><span class="nv">KANGODIR</span><span class="o">=</span>..<span class="se">\.</span>.<span class="se">\f</span>ramework<span class="se">\
</span>call<span class="w"> </span><span class="s2">&quot;%KANGODIR%\kango.py&quot;</span><span class="w"> </span>build<span class="w"> </span>.<span class="se">\
</span><span class="s2">&quot;C:\Program Files (x86)\GnuWin32\bin\wget&quot;</span><span class="w"> </span>--post-file<span class="o">=</span>output/extension_name_1.0.0.xpi<span class="w"> </span>http://localhost:8888/</code></pre>
</section>
<section id="chrome">
<h4>Chrome<a class="headerlink" href="#chrome" title="Permalink to this headline"> #</a></h4>
<p>Chrome не позволяет устанавливать расширения в автоматическом режиме, можно только перезагрузить уже установленное.</p>
<p><a class="reference external" href="https://chrome.google.com/webstore/detail/extensions-reloader/fimgfedafeadlieiabdeeaodndnlbhid" target="_blank">Extensions Reloader</a> перезагружает все расширения, установленные в &quot;development&quot; режиме при открытии вкладки с адресом <span class="docutils literal"><span class="pre">http://reload.extensions</span></span>.</p>
<p>Под Chrome не все так гладко, как под Firefox, а именно:</p>
<ul class="simple">
<li><p>Chrome не предоставляет способа полностью перезагрузить расширение, поэтому если были изменены настройки расширения (например, имя, описание и т.д.) необходимо открыть <span class="docutils literal"><span class="pre">chrome://extensions/</span></span> и вручную нажать <span class="docutils literal">Reload</span> для расширения.</p></li>
<li><p>Браузер забирает на себя фокус, так как перезагрузка расширения осуществляется открытием вкладки браузера.</p></li>
<li><p>Если был открыт инспектор background скриптов, то он будет закрыт.</p></li>
</ul>
<p>Shell скрипт под Mac OS:</p>
<pre class="code sh literal-block"><code><span class="ch">#!/bin/sh
</span><span class="nv">KANGODIR</span><span class="o">=</span><span class="s2">&quot;../../framework/&quot;</span><span class="w">
</span>python<span class="w"> </span><span class="nv">$KANGODIR</span>/kango.py<span class="w"> </span>build<span class="w"> </span>./<span class="w">
</span>open<span class="w"> </span>/Applications/Google<span class="se">\ </span>Chrome.app<span class="w"> </span>http://reload.extensions</code></pre>
<p>Под Windows:</p>
<pre class="code sh literal-block"><code>SET<span class="w"> </span><span class="nv">KANGODIR</span><span class="o">=</span>..<span class="se">\.</span>.<span class="se">\f</span>ramework<span class="se">\
</span>call<span class="w"> </span><span class="s2">&quot;%KANGODIR%\kango.py&quot;</span><span class="w"> </span>build<span class="w"> </span>.<span class="se">\
</span>start<span class="w"> </span><span class="s2">&quot;http://reload.extensions&quot;</span></code></pre>
</section>
</section>
<section id="ide">
<h3>Интеграция с IDE<a class="headerlink" href="#ide" title="Permalink to this headline"> #</a></h3>
<p>Почти все IDE позволяют запускать shell скрипт по заданному действию, например, по горячей клавише или из GUI.</p>
<p>Как это сделать в PyCharm или WebStorm:</p>
<ul class="simple">
<li><p>Выбираем в меню <span class="docutils literal">Run &gt; Edit <span class="pre">Configurations...</span></span>.</p></li>
<li><p>В открывшемся диалоге жмем <span class="docutils literal">+</span> и в выпадающем меню выбираем <span class="docutils literal">Python</span> для PyCharm или <span class="docutils literal">Node.js</span> для WebStorm.</p></li>
<li><p>В области <span class="docutils literal">Before launch</span> жмем <span class="docutils literal">+ &gt; Run External tool</span>.</p></li>
<li><p>В открывшемся диалоге опять жмем <span class="docutils literal">+</span>.</p></li>
<li><p>В поле <span class="docutils literal">Program</span> вводим <span class="docutils literal">sh</span>.</p></li>
<li><p>В поле <span class="docutils literal">Parameters</span> вводим имя скрипта сборки, например, <span class="docutils literal">build.sh</span>.</p></li>
</ul>
<p>Готово, теперь по команде <span class="docutils literal">Run</span> расширение пересобирается и обновляется в браузере не выходя из IDE.</p>
</section>
]]>
        </content>
    </entry>
    
</feed>