mkdocs/druid/druid.html

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <meta name="generator" content="pandoc">
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
  <meta name="author" content="Yann Esposito">
  <title>Druid for real-time analysis</title>
  <style type="text/css">code{white-space: pre;}</style>
  <link rel="stylesheet" href="../styling.css">
  <!--[if lt IE 9]>
    <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
  <![endif]-->
</head>
<body>
<header>
<h1 class="title">Druid for real-time analysis</h1>
<p class="author">Yann Esposito</p>
<p class="date">7 Avril 2016</p>
</header>
<nav id="TOC">
<ul>
<li><a href="#druid-the-sales-pitch">Druid the Sales Pitch</a></li>
<li><a href="#intro">Intro</a><ul>
<li><a href="#experience">Experience</a></li>
<li><a href="#real-time">Real Time?</a></li>
<li><a href="#demand">Demand</a></li>
<li><a href="#reality">Reality</a></li>
</ul></li>
<li><a href="#origin-php">Origin (PHP)</a></li>
<li><a href="#st-refactoring-node.js">1st Refactoring (Node.js)</a></li>
<li><a href="#return-of-experience">Return of Experience</a></li>
<li><a href="#return-of-experience-1">Return of Experience</a></li>
<li><a href="#nd-refactoring">2nd Refactoring</a></li>
<li><a href="#nd-refactoring-ftw">2nd Refactoring (FTW!)</a></li>
<li><a href="#nd-refactoring-return-of-experience">2nd Refactoring return of experience</a></li>
<li><a href="#demo">Demo</a></li>
<li><a href="#pre-considerations">Pre Considerations</a><ul>
<li><a href="#discovered-vs-invented">Discovered vs Invented</a></li>
<li><a href="#in-the-end">In the End</a></li>
</ul></li>
<li><a href="#druid">Druid</a><ul>
<li><a href="#who">Who?</a></li>
<li><a href="#goal">Goal</a></li>
<li><a href="#concepts">Concepts</a></li>
<li><a href="#key-features">Key Features</a></li>
<li><a href="#right-for-me">Right for me?</a></li>
</ul></li>
<li><a href="#high-level-architecture">High Level Architecture</a><ul>
<li><a href="#inspiration">Inspiration</a></li>
<li><a href="#index-immutability">Index / Immutability</a></li>
<li><a href="#storage">Storage</a></li>
<li><a href="#specialized-nodes">Specialized Nodes</a></li>
</ul></li>
<li><a href="#druid-vs-x">Druid vs X</a><ul>
<li><a href="#elasticsearch">Elasticsearch</a></li>
<li><a href="#keyvalue-stores-hbasecassandraopentsdb">Key/Value Stores (HBase/Cassandra/OpenTSDB)</a></li>
<li><a href="#spark">Spark</a></li>
<li><a href="#sql-on-hadoop-impaladrillspark-sqlpresto">SQL-on-Hadoop (Impala/Drill/Spark SQL/Presto)</a></li>
</ul></li>
<li><a href="#data">Data</a><ul>
<li><a href="#concepts-1">Concepts</a></li>
<li><a href="#indexing">Indexing</a></li>
<li><a href="#loading">Loading</a></li>
<li><a href="#querying">Querying</a></li>
<li><a href="#segments">Segments</a></li>
</ul></li>
<li><a href="#roll-up">Roll-up</a><ul>
<li><a href="#example">Example</a></li>
<li><a href="#as-sql">as SQL</a></li>
</ul></li>
<li><a href="#segments-1">Segments</a><ul>
<li><a href="#sharding">Sharding</a></li>
<li><a href="#core-data-structure">Core Data Structure</a></li>
<li><a href="#example-1">Example</a></li>
<li><a href="#example-multiple-matches">Example (multiple matches)</a></li>
<li><a href="#real-time-ingestion">Real-time ingestion</a></li>
<li><a href="#batch-ingestion">Batch Ingestion</a></li>
<li><a href="#real-time-ingestion-1">Real-time Ingestion</a></li>
</ul></li>
<li><a href="#querying-1">Querying</a><ul>
<li><a href="#query-types">Query types</a></li>
<li><a href="#examples">Example(s)</a></li>
<li><a href="#result">Result</a></li>
<li><a href="#caching">Caching</a></li>
</ul></li>
<li><a href="#druid-components">Druid Components</a><ul>
<li><a href="#druid-1">Druid</a></li>
<li><a href="#also">Also</a></li>
<li><a href="#coordinator">Coordinator</a></li>
</ul></li>
<li><a href="#when-not-to-choose-druid">When <em>not</em> to choose Druid</a></li>
<li><a href="#graphite-metrics">Graphite (metrics)</a></li>
<li><a href="#pivot-exploring-data">Pivot (exploring data)</a></li>
<li><a href="#caravel">Caravel</a></li>
<li><a href="#conclusions">Conclusions</a><ul>
<li><a href="#precompute-your-time-series">Precompute your time series?</a></li>
<li><a href="#dont-reinvent-it">Don’t reinvent it</a></li>
<li><a href="#druid-way-is-the-right-way">Druid way is the right way!</a></li>
</ul></li>
</ul>
</nav>
<h1 id="druid-the-sales-pitch">Druid the Sales Pitch</h1>
<ul>
<li>Sub-Second Queries</li>
<li>Real-time Streams</li>
<li>Scalable to Petabytes</li>
<li>Deploy Anywhere</li>
<li>Vibrant Community (Open Source)</li>
</ul>
<aside class="notes">
<ul>
<li>Ideal for powering user-facing analytic applications</li>
<li>Deploy anywhere: cloud, on-premise, integrate with Haddop, Spark, Kafka, Storm, Samza</li>
</ul>
</aside>
<h1 id="intro">Intro</h1>
<h2 id="experience">Experience</h2>
<ul>
<li>Real Time Social Media Analytics</li>
</ul>
<h2 id="real-time">Real Time?</h2>
<ul>
<li>Ingestion Latency: seconds</li>
<li>Query Latency: seconds</li>
</ul>
<h2 id="demand">Demand</h2>
<ul>
<li>Twitter: <code>20k msg/s</code>, <code>1msg = 10ko</code> during 24h</li>
<li>Facebook public: 1000 to 2000 msg/s continuously</li>
<li>Low Latency</li>
</ul>
<h2 id="reality">Reality</h2>
<ul>
<li>Twitter: 400 msg/s continuously, burst to 1500</li>
<li>Facebook: 1000 to 2000 msg/s</li>
</ul>
<h1 id="origin-php">Origin (PHP)</h1>
<p><img src="img/bad_php.jpg" alt="OH NOES PHP!" /> </p>
<h1 id="st-refactoring-node.js">1st Refactoring (Node.js)</h1>
<ul>
<li>Ingestion still in PHP</li>
<li>Node.js, Perl, Java &amp; R for sentiment analysis</li>
<li>MongoDB</li>
<li>Manually made time series (Incremental Map/Reduce)</li>
<li>Manually coded HyperLogLog in js</li>
</ul>
<h1 id="return-of-experience">Return of Experience</h1>
<p><img src="img/MongoDB.png" alt="MongoDB the destroyer" /> </p>
<h1 id="return-of-experience-1">Return of Experience</h1>
<ul>
<li>Ingestion still in PHP (600 msg/s max)</li>
<li>Node.js, Perl, Java (10 msg/s max)</li>
</ul>
<figure>
<img src="img/bored.gif" alt="Too Slow, Bored" /><figcaption>Too Slow, Bored</figcaption>
</figure>
<h1 id="nd-refactoring">2nd Refactoring</h1>
<ul>
<li>Haskell</li>
<li>Clojure / Clojurescript</li>
<li>Kafka / Zookeeper</li>
<li>Mesos / Marathon</li>
<li>Elasticsearch</li>
<li><strong>Druid</strong></li>
</ul>
<h1 id="nd-refactoring-ftw">2nd Refactoring (FTW!)</h1>
<p><img src="img/talking.jpg" alt="Now we’re talking" /> </p>
<h1 id="nd-refactoring-return-of-experience">2nd Refactoring return of experience</h1>
<ul>
<li>No limit, everything is scalable</li>
<li>High availability</li>
<li>Low latency: Ingestion &amp; User faced querying</li>
<li>Cheap if done correctly</li>
</ul>
<p><strong>Thanks Druid!</strong></p>
<h1 id="demo">Demo</h1>
<ul>
<li>Low Latency High Volume of Data Analysis</li>
<li>Typically <code>pulse</code></li>
</ul>
<p><a href="http://pulse.vigiglo.be/#/vigiglobe/Earthquake/dashboard" target="_blank"> DEMO Time </a></p>
<h1 id="pre-considerations">Pre Considerations</h1>
<h2 id="discovered-vs-invented">Discovered vs Invented</h2>
<p>Try to conceptualize a s.t.</p>
<ul>
<li>Ingest Events</li>
<li>Real-Time Queries</li>
<li>Scalable</li>
<li>Highly Available</li>
</ul>
<p>Analytics: timeseries, alerting system, top N, etc…</p>
<h2 id="in-the-end">In the End</h2>
<p>Druid concepts are always emerging naturally</p>
<h1 id="druid">Druid</h1>
<h2 id="who">Who?</h2>
<p>Metamarkets</p>
<p><a href="http://druid.io/druid-powered.html"
   target="_blank">Powered by Druid</a></p>
<ul>
<li>Alibaba, Cisco, Criteo, eBay, Hulu, Netflix, Paypal…</li>
</ul>
<h2 id="goal">Goal</h2>
<blockquote>
<p>Druid is an open source store designed for real-time exploratory analytics on large data sets.</p>
</blockquote>
<blockquote>
<p>hosted dashboard that would allow users to arbitrarily explore and visualize event streams.</p>
</blockquote>
<h2 id="concepts">Concepts</h2>
<ul>
<li>Column-oriented storage layout</li>
<li>distributed, shared-nothing architecture</li>
<li>advanced indexing structure</li>
</ul>
<h2 id="key-features">Key Features</h2>
<ul>
<li>Sub-second OLAP Queries</li>
<li>Real-time Streaming Ingestion</li>
<li>Power Analytic Applications</li>
<li>Cost Effective</li>
<li>High Available</li>
<li>Scalable</li>
</ul>
<h2 id="right-for-me">Right for me?</h2>
<ul>
<li>require fast aggregations</li>
<li>exploratory analytics</li>
<li>analysis in real-time</li>
<li>lots of data (trillions of events, petabytes of data)</li>
<li>no single point of failure</li>
</ul>
<h1 id="high-level-architecture">High Level Architecture</h1>
<h2 id="inspiration">Inspiration</h2>
<ul>
<li>Google’s <a href="http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36632.pdf">BigQuery/Dremel</a></li>
<li>Google’s <a href="http://vldb.org/pvldb/vol5/p1436_alexanderhall_vldb2012.pdf">PowerDrill</a></li>
</ul>
<h2 id="index-immutability">Index / Immutability</h2>
<p>Druid indexes data to create mostly immutable views.</p>
<h2 id="storage">Storage</h2>
<p>Store data in custom column format highly optimized for aggregation &amp; filter.</p>
<h2 id="specialized-nodes">Specialized Nodes</h2>
<ul>
<li>A Druid cluster is composed of various type of nodes</li>
<li>Each designed to do a small set of things very well</li>
<li>Nodes don’t need to be deployed on individual hardware</li>
<li>Many node types can be colocated in production</li>
</ul>
<h1 id="druid-vs-x">Druid vs X</h1>
<h2 id="elasticsearch">Elasticsearch</h2>
<ul>
<li>resource requirement much higher for ingestion &amp; aggregation</li>
<li>No data summarization (100x in real world data)</li>
</ul>
<h2 id="keyvalue-stores-hbasecassandraopentsdb">Key/Value Stores (HBase/Cassandra/OpenTSDB)</h2>
<ul>
<li>Must Pre-compute Result
<ul>
<li>Exponential storage</li>
<li>Hours of pre-processing time</li>
</ul></li>
<li>Use the dimensions as key (like in OpenTSDB)
<ul>
<li>No filter index other than range</li>
<li>Hard for complex predicates</li>
</ul></li>
</ul>
<h2 id="spark">Spark</h2>
<ul>
<li>Druid can be used to accelerate OLAP queries in Spark</li>
<li>Druid focuses on the latencies to ingest and serve queries</li>
<li>Too long for end user to arbitrarily explore data</li>
</ul>
<h2 id="sql-on-hadoop-impaladrillspark-sqlpresto">SQL-on-Hadoop (Impala/Drill/Spark SQL/Presto)</h2>
<ul>
<li>Queries: more data transfer between nodes</li>
<li>Data Ingestion: bottleneck by backing store</li>
<li>Query Flexibility: more flexible (full joins)</li>
</ul>
<h1 id="data">Data</h1>
<h2 id="concepts-1">Concepts</h2>
<ul>
<li><strong>Timestamp column</strong>: query centered on time axis</li>
<li><strong>Dimension columns</strong>: strings (used to filter or to group)</li>
<li><strong>Metric columns</strong>: used for aggregations (count, sum, mean, etc…)</li>
</ul>
<h2 id="indexing">Indexing</h2>
<ul>
<li>Immutable snapshots of data</li>
<li>data structure highly optimized for analytic queries</li>
<li>Each column is stored separately</li>
<li>Indexes data on a per shard (segment) level</li>
</ul>
<h2 id="loading">Loading</h2>
<ul>
<li>Real-Time</li>
<li>Batch</li>
</ul>
<h2 id="querying">Querying</h2>
<ul>
<li>JSON over HTTP</li>
<li>Single Table Operations, no joins.</li>
</ul>
<h2 id="segments">Segments</h2>
<ul>
<li>Per time interval
<ul>
<li>skip segments when querying</li>
</ul></li>
<li>Immutable
<ul>
<li>Cache friendly</li>
<li>No locking</li>
</ul></li>
<li>Versioned
<ul>
<li>No locking</li>
<li>Read-write concurrency</li>
</ul></li>
</ul>
<h1 id="roll-up">Roll-up</h1>
<h2 id="example">Example</h2>
<pre><code>timestamp             page    ... added  deleted
2011-01-01T00:01:35Z  Cthulhu     10      65
2011-01-01T00:03:63Z  Cthulhu     15      62
2011-01-01T01:04:51Z  Cthulhu     32      45
2011-01-01T01:01:00Z  Azatoth     17      87
2011-01-01T01:02:00Z  Azatoth     43      99
2011-01-01T02:03:00Z  Azatoth     12      53</code></pre>
<pre><code>timestamp             page    ... nb added deleted
2011-01-01T00:00:00Z  Cthulhu      2 25    127
2011-01-01T01:00:00Z  Cthulhu      1 32    45
2011-01-01T01:00:00Z  Azatoth      2 60    186
2011-01-01T02:00:00Z  Azatoth      1 12    53</code></pre>
<h2 id="as-sql">as SQL</h2>
<pre><code>GROUP BY timestamp, page, nb, added, deleted
  :: nb = COUNT(1)
  ,  added = SUM(added)
  ,  deleted = SUM(deleted)</code></pre>
<p>In practice can dramatically reduce the size (up to x100)</p>
<h1 id="segments-1">Segments</h1>
<h2 id="sharding">Sharding</h2>
<p><small><code>sampleData_2011-01-01T01:00:00:00Z_2011-01-01T02:00:00:00Z_v1_0</code></small></p>
<pre><code>timestamp             page    ... nb added deleted
2011-01-01T01:00:00Z  Cthulhu      1 20    45
2011-01-01T01:00:00Z  Azatoth      1 30    106</code></pre>
<p><small><code>sampleData_2011-01-01T01:00:00:00Z_2011-01-01T02:00:00:00Z_v1_0</code></small></p>
<pre><code>timestamp             page    ... nb added deleted
2011-01-01T01:00:00Z  Cthulhu      1 12    45
2011-01-01T01:00:00Z  Azatoth      2 30    80</code></pre>
<h2 id="core-data-structure">Core Data Structure</h2>
<p><img src="img/druid-column-types.png" alt="Segment" /> </p>
<ul>
<li>dictionary</li>
<li>a bitmap for each value</li>
<li>a list of the columns values encoded using the dictionary</li>
</ul>
<h2 id="example-1">Example</h2>
<pre><code>dictionary: { &quot;Cthulhu&quot;: 0
            , &quot;Azatoth&quot;: 1 }

column data: [0, 0, 1, 1]

bitmaps (one for each value of the column):
value=&quot;Cthulhu&quot;: [1,1,0,0]
value=&quot;Azatoth&quot;: [0,0,1,1]</code></pre>
<h2 id="example-multiple-matches">Example (multiple matches)</h2>
<pre><code>dictionary: { &quot;Cthulhu&quot;: 0
            , &quot;Azatoth&quot;: 1 }

column data: [0, [0,1], 1, 1]

bitmaps (one for each value of the column):
value=&quot;Cthulhu&quot;: [1,1,0,0]
value=&quot;Azatoth&quot;: [0,1,1,1]</code></pre>
<h2 id="real-time-ingestion">Real-time ingestion</h2>
<ul>
<li>Via Real-Time Node and Firehose
<ul>
<li>No redundancy or HA, thus not recommended</li>
</ul></li>
<li>Via Indexing Service and Tranquility API
<ul>
<li>Core API</li>
<li>Integration with Streaming Frameworks</li>
<li>HTTP Server</li>
<li><strong>Kafka Consumer</strong></li>
</ul></li>
</ul>
<h2 id="batch-ingestion">Batch Ingestion</h2>
<ul>
<li>File based (HDFS, S3, …)</li>
</ul>
<h2 id="real-time-ingestion-1">Real-time Ingestion</h2>
<pre><code>Task 1: [   Interval          ][ Window ]
Task 2:                        [                     ]
-----------------------------------------------------&gt;
                                                  time</code></pre>
<h1 id="querying-1">Querying</h1>
<h2 id="query-types">Query types</h2>
<ul>
<li>Group by: group by multiple dimensions</li>
<li>Top N: like grouping by a single dimension</li>
<li>Timeseries: without grouping over dimensions</li>
<li>Search: Dimensions lookup</li>
<li>Time Boundary: Find available data timeframe</li>
<li>Metadata queries</li>
</ul>
<h2 id="examples">Example(s)</h2>
<pre><code>{&quot;queryType&quot;: &quot;groupBy&quot;,
 &quot;dataSource&quot;: &quot;druidtest&quot;,
 &quot;granularity&quot;: &quot;all&quot;,
 &quot;dimensions&quot;: [],
 &quot;aggregations&quot;: [
     {&quot;type&quot;: &quot;count&quot;, &quot;name&quot;: &quot;rows&quot;},
     {&quot;type&quot;: &quot;longSum&quot;, &quot;name&quot;: &quot;imps&quot;, &quot;fieldName&quot;: &quot;impressions&quot;},
     {&quot;type&quot;: &quot;doubleSum&quot;, &quot;name&quot;: &quot;wp&quot;, &quot;fieldName&quot;: &quot;wp&quot;}
 ],
 &quot;intervals&quot;: [&quot;2010-01-01T00:00/2020-01-01T00&quot;]}</code></pre>
<h2 id="result">Result</h2>
<pre><code>[ {
  &quot;version&quot; : &quot;v1&quot;,
  &quot;timestamp&quot; : &quot;2010-01-01T00:00:00.000Z&quot;,
  &quot;event&quot; : {
    &quot;imps&quot; : 5,
    &quot;wp&quot; : 15000.0,
    &quot;rows&quot; : 5
  }
} ]</code></pre>
<h2 id="caching">Caching</h2>
<ul>
<li>Historical node level
<ul>
<li>By segment</li>
</ul></li>
<li>Broker Level
<ul>
<li>By segment and query</li>
<li><code>groupBy</code> is disabled on purpose!</li>
</ul></li>
<li>By default: local caching</li>
</ul>
<h1 id="druid-components">Druid Components</h1>
<h2 id="druid-1">Druid</h2>
<ul>
<li>Real-time Nodes</li>
<li>Historical Nodes</li>
<li>Broker Nodes</li>
<li>Coordinator</li>
<li>For indexing:
<ul>
<li>Overlord</li>
<li>Middle Manager</li>
</ul></li>
</ul>
<h2 id="also">Also</h2>
<ul>
<li>Deep Storage (S3, HDFS, …)</li>
<li>Metadata Storage (SQL)</li>
<li>Load Balancer</li>
<li>Cache</li>
</ul>
<h2 id="coordinator">Coordinator</h2>
<ul>
<li>Real-time Nodes (pull data, index it)</li>
<li>Historical Nodes (keep old segments)</li>
<li>Broker Nodes (route queries to RT &amp; Hist. nodes, merge)</li>
<li>Coordinator (manage segemnts)</li>
<li>For indexing:
<ul>
<li>Overlord (distribute task to the middle manager)</li>
<li>Middle Manager (execute tasks via Peons)</li>
</ul></li>
</ul>
<h1 id="when-not-to-choose-druid">When <em>not</em> to choose Druid</h1>
<ul>
<li>Data is not time-series</li>
<li>Cardinality is <em>very</em> high</li>
<li>Number of dimensions is high</li>
<li>Setup cost must be avoided</li>
</ul>
<h1 id="graphite-metrics">Graphite (metrics)</h1>
<p><img src="img/graphite.png" alt="Graphite" />__</p>
<p><a href="http://graphite.wikidot.com">Graphite</a></p>
<h1 id="pivot-exploring-data">Pivot (exploring data)</h1>
<p><img src="img/pivot.gif" alt="Pivot" /> </p>
<p><a href="https://github.com/implydata/pivot">Pivot</a></p>
<h1 id="caravel">Caravel</h1>
<p><img src="img/caravel.png" alt="caravel" /> </p>
<p><a href="https://github.com/airbnb/caravel">Caravel</a></p>
<h1 id="conclusions">Conclusions</h1>
<h2 id="precompute-your-time-series">Precompute your time series?</h2>
<p><img src="img/wrong.jpg" alt="You’re doing it wrong" /> </p>
<h2 id="dont-reinvent-it">Don’t reinvent it</h2>
<ul>
<li>need a user facing API</li>
<li>need time series on many dimensions</li>
<li>need real-time</li>
<li>big volume of data</li>
</ul>
<h2 id="druid-way-is-the-right-way">Druid way is the right way!</h2>
<ol type="1">
<li>Push in kafka</li>
<li>Add the right dimensions</li>
<li>Push in druid</li>
<li>???</li>
<li>Profit!</li>
</ol>
<div id="footer">
  <a href="http://yannesposito.com">yannesposito.com</a>
  —
  Proudly generated by <a href="http://github.com/yogsototh/mkdocs">mkdocs</a>
</div>
</body>
</html>