partial commit

This commit is contained in:
Yann Esposito 2015-10-26 14:27:51 +01:00
parent 5ce7b3a696
commit ff8451d283
7 changed files with 485 additions and 13 deletions

View file

@ -27,8 +27,11 @@
<li><a href="http://pandoc.org">pandoc</a></li>
<li><a href="http://xelatex.org">XeLaTeX</a></li>
</ul>
<hr/>
<div id="footer">
<a href="yannesposito.com">Y</a>
<a href="http://about.me/gbuisson">G</a> -
<a href="http://sprunck.com">S</a> -
<a href="http://yannesposito.com">Y</a>
</div>
</body>
</html>

BIN
README.pdf Normal file

Binary file not shown.

View file

@ -16,6 +16,7 @@
<ul>
<li><a href="#todo-choose-a-title">TODO: choose a title</a><ul>
<li><a href="#plan">Plan</a></li>
<li><a href="#the-elephant-graveyard">The Elephant Graveyard</a></li>
</ul></li>
</ul>
</nav>
@ -25,6 +26,11 @@
<h2 id="plan">Plan</h2>
<p>TODO: Remove the detailled plan</p>
<ul>
<li>Start with the end
<ul>
<li>show a pulse</li>
<li>explain what is simple / hard</li>
</ul></li>
<li>The situation before
<ul>
<li>pb with volume (MongoDB / PHP, etc…)</li>
@ -59,8 +65,21 @@
</ul></li>
<li>One year later (maintenance and impressions)</li>
</ul>
<h2 id="the-elephant-graveyard">The Elephant Graveyard</h2>
<p>Imagine you could get all tweets in realtime.</p>
<p>Imagine you need to count them. Imagine you need to filter them by keywords. Imagine you need to answer complex questions about them in realtime. For example, how many tweets from women, containing the word <code>clojure</code> expressing a positive sentiment during the last hour. Imagine the same question about the last year.</p>
<p>How would you do it?</p>
<p>First youll need to absorb tweets in realtime. The twitter streaming API is here for that. But you are limited to 1% of all twitter volume. If you want not to be limited, you need either deal directly with twitter or use GNIP.</p>
<p>Next, youll need to keep only tweet of interest. By example, youll need to filter by keyword.</p>
<p>Just after that, you need to add informations for each received tweet. You need to enrich them by information it doesnt already possess. For example, the gender of the author of a tweet must be guessed. The same for the sentiment expressed by the tweet.</p>
<p>In the end youll need to display all these informations in real-time. By real-time we mean with a very low latency.</p>
<p>Under the minute is generally acceptable. But under the hood, we generally have less than half a second of latency.</p>
<p>Most of the lantency is due to twitter (about 2s) or gnip (about 15s).</p>
<hr/>
<div id="footer">
<a href="yannesposito.com">Y</a>
<a href="http://about.me/gbuisson">G</a> -
<a href="http://sprunck.com">S</a> -
<a href="http://yannesposito.com">Y</a>
</div>
</body>
</html>

View file

@ -8,6 +8,9 @@ TODO: introduction (20 lines max)
TODO: Remove the detailled plan
- Start with the end
- show a pulse
- explain what is simple / hard
- The situation before
- pb with volume (MongoDB / PHP, etc...)
- securities issues
@ -33,3 +36,36 @@ TODO: Remove the detailled plan
- was it a mistake?
- Do we have any doubts?
- One year later (maintenance and impressions)
## The Elephant Graveyard
Imagine you could get all tweets in realtime.
Imagine you need to count them.
Imagine you need to filter them by keywords.
Imagine you need to answer complex questions about them in realtime.
For example, how many tweets from women, containing the word `clojure` expressing a positive sentiment during the last hour.
Imagine the same question about the last year.
How would you do it?
First you'll need to absorb tweets in realtime.
The twitter streaming API is here for that.
But you are limited to 1% of all twitter volume.
If you want not to be limited, you need either deal directly with twitter or use GNIP.
Next, you'll need to keep only tweet of interest.
By example, you'll need to filter by keyword.
Just after that, you need to add informations for each received tweet.
You need to enrich them by information it doesn't already possess.
For example, the gender of the author of a tweet must be guessed.
The same for the sentiment expressed by the tweet.
In the end you'll need to display all these informations in real-time.
By real-time we mean with a very low latency.
Under the minute is generally acceptable.
But under the hood, we generally have less than half a second of latency.
Most of the lantency is due to twitter (about 2s) or gnip (about 15s).

Binary file not shown.

View file

@ -1,3 +1,6 @@
<hr/>
<div id="footer">
<a href="yannesposito.com">Y</a>
<a href="http://about.me/gbuisson">G</a> -
<a href="http://sprunck.com">S</a> -
<a href="http://yannesposito.com">Y</a>
</div>

File diff suppressed because one or more lines are too long