<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Darian Shimy &#187; realtime</title>
	<atom:link href="http://www.darianshimy.com/tag/realtime/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.darianshimy.com</link>
	<description></description>
	<lastBuildDate>Sat, 01 Oct 2011 06:06:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Hot Potato &#8211; A Real-time Processing Framework</title>
		<link>http://www.darianshimy.com/2011/07/hot-potato/</link>
		<comments>http://www.darianshimy.com/2011/07/hot-potato/#comments</comments>
		<pubDate>Wed, 27 Jul 2011 16:00:22 +0000</pubDate>
		<dc:creator>Darian Shimy</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[hotpotato]]></category>
		<category><![CDATA[realtime]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://www.darianshimy.com/?p=536</guid>
		<description><![CDATA[Today, I am happy to announce the availability of HotPotato.  Hot Potato is an open source real-time processing framework written in Ruby. Originally designed to process the Twitter firehose at 3,000+ tweets per second, it has been extended to support &#8230; <a href="http://www.darianshimy.com/2011/07/hot-potato/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Today, I am happy to announce the availability of HotPotato.  Hot Potato is an open source real-time processing framework written in Ruby. Originally designed to process the Twitter firehose at 3,000+ tweets per second, it has been extended to support any type of streaming data as input or output to the framework. The framework excels with applications such as, social media analysis, log processing, fraud prevention, spam detection, instant messaging, and many others that include the processing of streaming data.</p>
<p>Related Links:</p>
<ul>
<li>GitHub Repository: <a href="http://github.com/dshimy/HotPotato">http://github.com/dshimy/HotPotato</a></li>
<li>Google Group: <a href="http://groups.google.com/group/hotpotato-rb">http://groups.google.com/group/hotpotato-rb</a></li>
<li>Presentation: <a href="http://www.slideshare.net/dshimy/hot-potato-8704624">http://www.slideshare.net/dshimy/hot-potato-8704624</a></li>
<li>My Profile: <a href="http://profiles.google.com/dshimy">http://profiles.google.com/dshimy</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.darianshimy.com/2011/07/hot-potato/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Efficient Real-Time Memory De-duplication</title>
		<link>http://www.darianshimy.com/2011/02/efficient-real-time-memory-de-duplication/</link>
		<comments>http://www.darianshimy.com/2011/02/efficient-real-time-memory-de-duplication/#comments</comments>
		<pubDate>Fri, 11 Feb 2011 06:33:31 +0000</pubDate>
		<dc:creator>Darian Shimy</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[realtime]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://www.darianshimy.com/?p=497</guid>
		<description><![CDATA[De-duplication is the process of removing duplicates from a collection. Hashes and bloom filters are common tools to use when implementing de-duplication, however, there are times when these are not fast enough. It is fairly simple to implement a memory &#8230; <a href="http://www.darianshimy.com/2011/02/efficient-real-time-memory-de-duplication/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>De-duplication is the process of removing duplicates from a collection.  Hashes and bloom filters are common tools to use when implementing de-duplication, however, there are times when these are not fast enough.  It is fairly simple to implement a memory efficient real-time de-duplication system using two hashes.  The secret is in adding the objects to both hashes and every n objects (or time-based) making the current hash become the next and clearing the current once the number of objects reaches a certain threshold.  </p>
<p>This is easier explained by a simple reference implementation in Ruby:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#008000; font-style:italic;"># A memory efficient deduplication class used in cases where a</span>
<span style="color:#008000; font-style:italic;"># duplicate object can only occur within n objects of each other.</span>
<span style="color:#9966CC; font-weight:bold;">class</span> MemoryDedup
&nbsp;
  <span style="color:#008000; font-style:italic;"># Returns a new deduper.</span>
  <span style="color:#008000; font-style:italic;">#</span>
  <span style="color:#008000; font-style:italic;"># == Options</span>
  <span style="color:#008000; font-style:italic;"># * &lt;tt&gt;:size&lt;/tt&gt; - the number of objects to store in the cache</span>
  <span style="color:#9966CC; font-weight:bold;">def</span> initialize<span style="color:#006600; font-weight:bold;">&#40;</span>options = <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">&#125;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#0066ff; font-weight:bold;">@size</span> = options<span style="color:#006600; font-weight:bold;">&#123;</span>:size<span style="color:#006600; font-weight:bold;">&#125;</span> <span style="color:#006600; font-weight:bold;">||</span> <span style="color:#006666;">1000</span>
    <span style="color:#0066ff; font-weight:bold;">@current</span> = <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">&#125;</span>
    @<span style="color:#9966CC; font-weight:bold;">next</span> = <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">&#125;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
  <span style="color:#008000; font-style:italic;"># Returns true if the object has been seen in the past :size unique</span>
  <span style="color:#008000; font-style:italic;"># objects, false otherwise.</span>
  <span style="color:#9966CC; font-weight:bold;">def</span> exists?<span style="color:#006600; font-weight:bold;">&#40;</span>obj<span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#9966CC; font-weight:bold;">if</span> <span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006666;">2</span> <span style="color:#006600; font-weight:bold;">*</span> <span style="color:#0066ff; font-weight:bold;">@current</span>.<span style="color:#9900CC;">size</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&gt;</span> <span style="color:#0066ff; font-weight:bold;">@size</span>
      <span style="color:#0066ff; font-weight:bold;">@current</span> = @<span style="color:#9966CC; font-weight:bold;">next</span>
      @<span style="color:#9966CC; font-weight:bold;">next</span> = <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">&#125;</span>
    <span style="color:#9966CC; font-weight:bold;">end</span>
    @<span style="color:#9966CC; font-weight:bold;">next</span><span style="color:#006600; font-weight:bold;">&#91;</span>obj<span style="color:#006600; font-weight:bold;">&#93;</span> = <span style="color:#0000FF; font-weight:bold;">true</span>
    <span style="color:#9966CC; font-weight:bold;">if</span> <span style="color:#0066ff; font-weight:bold;">@current</span>.<span style="color:#9900CC;">has_key</span>?<span style="color:#006600; font-weight:bold;">&#40;</span>obj<span style="color:#006600; font-weight:bold;">&#41;</span>
      <span style="color:#0000FF; font-weight:bold;">return</span> <span style="color:#0000FF; font-weight:bold;">true</span>
    <span style="color:#9966CC; font-weight:bold;">else</span>
      <span style="color:#0066ff; font-weight:bold;">@current</span><span style="color:#006600; font-weight:bold;">&#91;</span>obj<span style="color:#006600; font-weight:bold;">&#93;</span> = <span style="color:#0000FF; font-weight:bold;">true</span>
      <span style="color:#0000FF; font-weight:bold;">return</span> <span style="color:#0000FF; font-weight:bold;">false</span>
    <span style="color:#9966CC; font-weight:bold;">end</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
<span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://www.darianshimy.com/2011/02/efficient-real-time-memory-de-duplication/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Realtime Log Visualization</title>
		<link>http://www.darianshimy.com/2009/12/realtime-log-visualization/</link>
		<comments>http://www.darianshimy.com/2009/12/realtime-log-visualization/#comments</comments>
		<pubDate>Thu, 10 Dec 2009 22:00:16 +0000</pubDate>
		<dc:creator>Darian Shimy</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[fun]]></category>
		<category><![CDATA[realtime]]></category>

		<guid isPermaLink="false">http://www.darianshimy.com/?p=464</guid>
		<description><![CDATA[Erlend Simonsen put out a program called gltail that creates a graphical representation of your log files. I saw this a long time ago, but last night I finally started playing with it. It works quite well. I needed to &#8230; <a href="http://www.darianshimy.com/2009/12/realtime-log-visualization/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Erlend Simonsen put out a program called <a href="http://www.fudgie.org/">gltail</a> that creates a graphical representation of your log files.</p>
<p><a href="http://www.darianshimy.com/wp-content/uploads/2009/12/Screen-shot-2009-12-10-at-2.04.53-PM.png"><img class="aligncenter size-medium wp-image-466" title="Screen shot 2009-12-10 at 2.04.53 PM" src="http://www.darianshimy.com/wp-content/uploads/2009/12/Screen-shot-2009-12-10-at-2.04.53-PM-300x210.png" alt="Screen shot 2009-12-10 at 2.04.53 PM" width="300" height="210" /></a></p>
<p>I saw this a long time ago, but last night I finally started playing with it.  It works quite well.  I needed to make a small change to support an SSH gateway which you can get here: <a href="http://github.com/dshimy/gltail">http://github.com/dshimy/gltail</a>.  On OS X, getting started was simple:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">$ <span style="color: #c20cb9; font-weight: bold;">git</span> clone <span style="color: #c20cb9; font-weight: bold;">git</span>:<span style="color: #000000; font-weight: bold;">//</span>github.com<span style="color: #000000; font-weight: bold;">/</span>dshimy<span style="color: #000000; font-weight: bold;">/</span>gltail.git
$ <span style="color: #c20cb9; font-weight: bold;">sudo</span> gem <span style="color: #c20cb9; font-weight: bold;">install</span> ruby-opengl file-tail</pre></div></div>

<p>If you want the dots to bump each other, you need to install the Chipmunk physics library:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">$ <span style="color: #7a0874; font-weight: bold;">cd</span> vendor<span style="color: #000000; font-weight: bold;">/</span>Chipmunk-4.1.0<span style="color: #000000; font-weight: bold;">/</span>ruby
$ ruby extconf.rb
$ <span style="color: #c20cb9; font-weight: bold;">sudo</span> <span style="color: #c20cb9; font-weight: bold;">make</span> <span style="color: #c20cb9; font-weight: bold;">install</span></pre></div></div>

<p>Beyond that, tweak the configuration file and enjoy.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.darianshimy.com/2009/12/realtime-log-visualization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Yahoo Search Index Speed</title>
		<link>http://www.darianshimy.com/2009/09/yahoo-search-index-speed/</link>
		<comments>http://www.darianshimy.com/2009/09/yahoo-search-index-speed/#comments</comments>
		<pubDate>Thu, 10 Sep 2009 06:19:00 +0000</pubDate>
		<dc:creator>Darian Shimy</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[fail]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[realtime]]></category>
		<category><![CDATA[yahoo]]></category>

		<guid isPermaLink="false">http://www.darianshimy.com/?p=354</guid>
		<description><![CDATA[I submitted this blog to Yahoo a while ago.  Why?  It was more of a test of how fast they spider sites rather than an attempt to drive traffic.  After the submission, they stated: Thank you! Your URL has been &#8230; <a href="http://www.darianshimy.com/2009/09/yahoo-search-index-speed/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I submitted this blog to <a href="http://www.yahoo.com">Yahoo</a> a while ago.  Why?  It was more of a test of how fast they spider sites rather than an attempt to drive traffic.  After the submission, they stated:</p>
<blockquote><p>Thank you! Your URL has been added to our list of URLs to crawl. Please expect a delay of several weeks before your URL is crawled.</p></blockquote>
<p>Several weeks?  Really!</p>
<p>Since the submission, it appears Yahoo is doing their best to meet their goal of several weeks.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.darianshimy.com/2009/09/yahoo-search-index-speed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

