<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jeremy Kemp &#187; Uni</title>
	<atom:link href="http://www.jeremykemp.co.uk/tag/uni/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.jeremykemp.co.uk</link>
	<description>//TODO</description>
	<lastBuildDate>Sun, 15 Jan 2012 15:32:26 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Shared Memory Tip</title>
		<link>http://www.jeremykemp.co.uk/07/02/2011/shared-memory-tip/</link>
		<comments>http://www.jeremykemp.co.uk/07/02/2011/shared-memory-tip/#comments</comments>
		<pubDate>Mon, 07 Feb 2011 15:39:37 +0000</pubDate>
		<dc:creator>Jeremy</dc:creator>
				<category><![CDATA[CUDA]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Uni]]></category>

		<guid isPermaLink="false">http://www.jeremykemp.co.uk/?p=195</guid>
		<description><![CDATA[As usual, I&#8217;m knee deep in CUDA optimising a fair few algorithms from various papers. Recently, I&#8217;ve been implementing the algorithms from this paper with the aim of improving them later/creating my own based from their concepts. The algorithm is an All Pairs Shortest Path algorithm with a nested loop in the kernel. Each time [...]]]></description>
			<content:encoded><![CDATA[<p>As usual, I&#8217;m knee deep in CUDA optimising a fair few algorithms from various papers. Recently, I&#8217;ve been implementing the algorithms from <a href="http://www.computer.org/portal/web/csdl/doi/10.1109/ITNG.2010.230" target="_blank">this</a> paper with the aim of improving them later/creating my own based from their concepts. The algorithm is an All Pairs Shortest Path algorithm with a nested loop in the kernel. Each time the second loop executes, two values from shared memory are added together and the resulted is evaluated against another variable stored in a register on the appropriate core. For some reason the code was running a lot slower than the results posted in the paper.</p>
<p>My <a href="http://laurencedawson.com/" target="_blank">friend</a> here at Durham who is also working with CUDA suggested taking the addition out of the loop and storing the result in a register before the conditional. Much to my surprise, this worked a treat and instantly gave me comparable results with the paper.</p>
<p>Here is the original code before the change:</p>
<div class="geshi no cpp">
<div class="head">for (int i = 0; i &lt; gridDim.x; i ++)</div>
<ol>
<li class="li1">
<div class="de1"><span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; __shared__ <span class="kw4">int</span> row<span class="br0">&#91;</span>blockWidth<span class="br0">&#93;</span><span class="br0">&#91;</span>blockHeight<span class="br0">&#93;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; __shared__ <span class="kw4">int</span> column<span class="br0">&#91;</span>blockWidth<span class="br0">&#93;</span><span class="br0">&#91;</span>blockHeight<span class="br0">&#93;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="co1">//Code here fills row and column</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; __syncthreads<span class="br0">&#40;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="kw1">for</span><span class="br0">&#40;</span><span class="kw4">int</span> k <span class="sy1">=</span> <span class="nu0">0</span>; k <span class="sy3">&amp;</span>lt; blockWidth; k <span class="sy2">++</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="kw1">if</span><span class="br0">&#40;</span>row<span class="br0">&#91;</span>threadIdx.<span class="me1">y</span><span class="br0">&#93;</span><span class="br0">&#91;</span>k<span class="br0">&#93;</span> <span class="sy2">+</span> column<span class="br0">&#91;</span>k<span class="br0">&#93;</span><span class="br0">&#91;</span>threadIdx.<span class="me1">x</span><span class="br0">&#93;</span> <span class="sy3">&amp;</span>lt; value<span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; value <span class="sy1">=</span> row<span class="br0">&#91;</span>threadIdx.<span class="me1">y</span><span class="br0">&#93;</span><span class="br0">&#91;</span>k<span class="br0">&#93;</span> <span class="sy2">+</span> column<span class="br0">&#91;</span>k<span class="br0">&#93;</span><span class="br0">&#91;</span>threadIdx.<span class="me1">x</span><span class="br0">&#93;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="br0">&#125;</span></div>
</li>
</ol>
</div>
<p>Here, we can see the change needed to drastically improve the running time of the algorithm:</p>
<div class="geshi no cpp">
<div class="head">unsigned int sum;</div>
<ol>
<li class="li1">
<div class="de1"><span class="kw1">for</span><span class="br0">&#40;</span><span class="kw4">unsigned</span> <span class="kw4">int</span> k <span class="sy1">=</span> <span class="nu0">0</span>; k <span class="sy3">&amp;</span>lt; blockWidth; k <span class="sy2">++</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; sum <span class="sy1">=</span> row<span class="br0">&#91;</span>threadIdx.<span class="me1">y</span><span class="br0">&#93;</span><span class="br0">&#91;</span>k<span class="br0">&#93;</span> <span class="sy2">+</span> column<span class="br0">&#91;</span>k<span class="br0">&#93;</span><span class="br0">&#91;</span>threadIdx.<span class="me1">x</span><span class="br0">&#93;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="kw1">if</span><span class="br0">&#40;</span>sum <span class="sy3">&amp;</span>lt; value<span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;value <span class="sy1">=</span> sum;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="br0">&#125;</span></div>
</li>
</ol>
</div>
<p>Given that shared memory is so quick on CUDA, similar to an L1 cache on CPU, I wouldn&#8217;t have thought that it would have made any difference at all. Obviously, I was wrong! So watch out for things like this when using CUDA or any parallel computing platform.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jeremykemp.co.uk/07/02/2011/shared-memory-tip/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Om Nom Urgh</title>
		<link>http://www.jeremykemp.co.uk/29/11/2009/om-nom-urgh/</link>
		<comments>http://www.jeremykemp.co.uk/29/11/2009/om-nom-urgh/#comments</comments>
		<pubDate>Sun, 29 Nov 2009 23:27:18 +0000</pubDate>
		<dc:creator>Jeremy</dc:creator>
				<category><![CDATA[Uni]]></category>
		<category><![CDATA[College Food]]></category>

		<guid isPermaLink="false">http://www.jeremykemp.co.uk/?p=83</guid>
		<description><![CDATA[Usually food at uni is pretty good but last night I got served this thing (we&#8217;re catered). God knows what was going on in the kitchen but it was terrible. Who serves a chicken boob with no sauce or any kind of seasoning!? Whats more, the &#8220;vegetables&#8221; consisted of cauliflower and yet more cauliflower. There [...]]]></description>
			<content:encoded><![CDATA[<p>Usually food at uni is pretty good but last night I got served this thing (we&#8217;re catered). God knows what was going on in the kitchen but it was terrible. Who serves a chicken boob with no sauce or any kind of seasoning!? Whats more, the &#8220;vegetables&#8221; consisted of cauliflower and yet more cauliflower. There was nothing else on offer apart from some bread. WTF?</p>
<div id="attachment_82" class="wp-caption aligncenter" style="width: 235px"><img class="size-medium wp-image-82 " title="College Food" src="http://www.jeremykemp.co.uk/wp-content/uploads/2009/11/Image020-225x300.jpg" alt="The Abomination" width="225" height="300" /><p class="wp-caption-text">The Abomination</p></div>
<p>Here&#8217;s to hoping this remains the worst uni food I will ever receive!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jeremykemp.co.uk/29/11/2009/om-nom-urgh/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

