update
This commit is contained in:
parent
52c9112e33
commit
af539989e2
4 changed files with 204 additions and 4 deletions
208
druid/druid.md
208
druid/druid.md
|
@ -11,13 +11,18 @@ date: 7 Avril 2016
|
||||||
|
|
||||||
## Plan
|
## Plan
|
||||||
|
|
||||||
- Introduction ; pourquoi ?
|
- Introduction; why?
|
||||||
- Comment ?
|
- How?
|
||||||
|
|
||||||
## Expérience
|
## Experience
|
||||||
|
|
||||||
- Real Time Social Media Analytics
|
- Real Time Social Media Analytics
|
||||||
|
|
||||||
|
## Real Time?
|
||||||
|
|
||||||
|
- Ingestion Latency: seconds
|
||||||
|
- Query Latency: seconds
|
||||||
|
|
||||||
## Demande
|
## Demande
|
||||||
|
|
||||||
- Twitter: `20k msg/s`, `1msg = 10ko` pendant 24h
|
- Twitter: `20k msg/s`, `1msg = 10ko` pendant 24h
|
||||||
|
@ -40,6 +45,22 @@ date: 7 Avril 2016
|
||||||
DEMO
|
DEMO
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
|
## Pre Considerations
|
||||||
|
|
||||||
|
Discovered vs Invented
|
||||||
|
|
||||||
|
## Try to conceptualize (events)
|
||||||
|
|
||||||
|
Scalable + Real Time + Fail safe
|
||||||
|
|
||||||
|
- timeseries
|
||||||
|
- alerting system
|
||||||
|
- top N
|
||||||
|
- etc...
|
||||||
|
|
||||||
|
## In the End
|
||||||
|
|
||||||
|
Druid concepts are always emerging naturally
|
||||||
|
|
||||||
# Druid
|
# Druid
|
||||||
|
|
||||||
|
@ -69,4 +90,183 @@ Metamarkets
|
||||||
|
|
||||||
**arbitrary exploration of billion-row tables tables with sub-second latencies**
|
**arbitrary exploration of billion-row tables tables with sub-second latencies**
|
||||||
|
|
||||||
## Proof
|
## Storage
|
||||||
|
|
||||||
|
- Columnar
|
||||||
|
- Inverted Index
|
||||||
|
- Immutable Segments
|
||||||
|
|
||||||
|
## Columnar Storage
|
||||||
|
|
||||||
|
## Index
|
||||||
|
|
||||||
|
- Values are dictionary encoded
|
||||||
|
|
||||||
|
`{"USA" 1, "Canada" 2, "Mexico" 3, ...}`
|
||||||
|
|
||||||
|
- Bitmap for every dimension value (used by filters)
|
||||||
|
|
||||||
|
`"USA" -> [0 1 0 0 1 1 0 0 0]`
|
||||||
|
|
||||||
|
- Column values (used by aggergation queries)
|
||||||
|
|
||||||
|
`[2,1,3,15,1,1,2,8,7]`
|
||||||
|
|
||||||
|
## Data Segments
|
||||||
|
|
||||||
|
- Per time interval
|
||||||
|
- skip segments when querying
|
||||||
|
- Immutable
|
||||||
|
- Cache friendly
|
||||||
|
- No locking
|
||||||
|
- Versioned
|
||||||
|
- No locking
|
||||||
|
- Read-write concurrency
|
||||||
|
|
||||||
|
## Real-time ingestion
|
||||||
|
|
||||||
|
- Via Real-Time Node and Firehose
|
||||||
|
- No redundancy or HA, thus not recommended
|
||||||
|
- Via Indexing Service and Tranquility API
|
||||||
|
- Core API
|
||||||
|
- Integration with Streaming Frameworks
|
||||||
|
- HTTP Server
|
||||||
|
- **Kafka Consumer**
|
||||||
|
|
||||||
|
## Batch Ingestion
|
||||||
|
|
||||||
|
- File based (HDFS, S3, ...)
|
||||||
|
|
||||||
|
## Real-time Ingestion
|
||||||
|
|
||||||
|
~~~
|
||||||
|
Task 1: [ Interval ][ Window ]
|
||||||
|
Task 2: [ ]
|
||||||
|
--------------------------------------->
|
||||||
|
time
|
||||||
|
~~~
|
||||||
|
|
||||||
|
Minimum indexing slots =
|
||||||
|
Data Sources × Partitions × Replicas × 2
|
||||||
|
|
||||||
|
# Querying
|
||||||
|
|
||||||
|
## Query types
|
||||||
|
|
||||||
|
- Group by: group by multiple dimensions
|
||||||
|
- Top N: like grouping by a single dimension
|
||||||
|
- Timeseries: without grouping over dimensions
|
||||||
|
- Search: Dimensions lookup
|
||||||
|
- Time Boundary: Find available data timeframe
|
||||||
|
- Metadata queries
|
||||||
|
|
||||||
|
## Tip
|
||||||
|
|
||||||
|
- Prefer `topN` over `groupBy`
|
||||||
|
- Prefer `timeseries` over `topN`
|
||||||
|
- Use limits (and priorities)
|
||||||
|
|
||||||
|
## Query Spec
|
||||||
|
|
||||||
|
- Data source
|
||||||
|
- Dimensions
|
||||||
|
- Interval
|
||||||
|
- Filters
|
||||||
|
- Aggergations
|
||||||
|
- Post Aggregations
|
||||||
|
- Granularity
|
||||||
|
- Context (query configuration)
|
||||||
|
- Limit
|
||||||
|
|
||||||
|
## Example(s)
|
||||||
|
|
||||||
|
TODO
|
||||||
|
|
||||||
|
## Caching
|
||||||
|
|
||||||
|
- Historical node level
|
||||||
|
- By segment
|
||||||
|
- Broker Level
|
||||||
|
- By segment and query
|
||||||
|
- `groupBy` is disabled on purpose!
|
||||||
|
- By default - local caching
|
||||||
|
|
||||||
|
## Load Rules
|
||||||
|
|
||||||
|
- Can be defined
|
||||||
|
- What can be set
|
||||||
|
|
||||||
|
# Components
|
||||||
|
|
||||||
|
## Druid Components
|
||||||
|
|
||||||
|
- Real-time Nodes
|
||||||
|
- Historical Nodes
|
||||||
|
- Broker Nodes
|
||||||
|
- Coordinator
|
||||||
|
- For indexing:
|
||||||
|
- Overlord
|
||||||
|
- Middle Manager
|
||||||
|
|
||||||
|
+ Deep Storage
|
||||||
|
+ Metadata Storage
|
||||||
|
|
||||||
|
+ Load Balancer
|
||||||
|
+ Cache
|
||||||
|
|
||||||
|
## Coordinator
|
||||||
|
|
||||||
|
Manage Segments
|
||||||
|
|
||||||
|
## Real-time Nodes
|
||||||
|
|
||||||
|
- Pulling data in real-time
|
||||||
|
- Indexing it
|
||||||
|
|
||||||
|
## Historical Nodes
|
||||||
|
|
||||||
|
- Keep historical segments
|
||||||
|
|
||||||
|
## Overlord
|
||||||
|
|
||||||
|
- Accepts tasks and distributes them to middle manager
|
||||||
|
|
||||||
|
## Middle Manager
|
||||||
|
|
||||||
|
- Execute submitted tasks via Peons
|
||||||
|
|
||||||
|
## Broker Nodes
|
||||||
|
|
||||||
|
- Route query to Real-time and Historical nodes
|
||||||
|
- Merge results
|
||||||
|
|
||||||
|
## Deep Storage
|
||||||
|
|
||||||
|
- Segments backup (HDFS, S3, ...)
|
||||||
|
|
||||||
|
# Considerations & Tools
|
||||||
|
|
||||||
|
## When *not* to choose Druid
|
||||||
|
|
||||||
|
- Data is not time-series
|
||||||
|
- Cardinality is _very_ high
|
||||||
|
- Number of dimensions is high
|
||||||
|
- Setup cost must be avoided
|
||||||
|
|
||||||
|
## Graphite (metrics)
|
||||||
|
|
||||||
|
![Graphite](img/graphite.png)\__
|
||||||
|
|
||||||
|
[Graphite](http://graphite.wikidot.com)
|
||||||
|
|
||||||
|
## Pivot (exploring data)
|
||||||
|
|
||||||
|
![Pivot](img/pivot.gif)\
|
||||||
|
|
||||||
|
[Pivot](https://github.com/implydata/pivot)
|
||||||
|
|
||||||
|
## Caravel (exploring data)
|
||||||
|
|
||||||
|
![caravel](img/caravel.png)\
|
||||||
|
|
||||||
|
[Caravel](https://github.com/airbnb/caravel)
|
||||||
|
|
BIN
druid/img/caravel.png
Normal file
BIN
druid/img/caravel.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 300 KiB |
BIN
druid/img/graphite.png
Normal file
BIN
druid/img/graphite.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 166 KiB |
BIN
druid/img/pivot.gif
Normal file
BIN
druid/img/pivot.gif
Normal file
Binary file not shown.
After Width: | Height: | Size: 360 KiB |
Loading…
Reference in a new issue