<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Virtual Private Server Cloud Blog &#124; VPS.NET &#187; Hypervisor</title>
	<atom:link href="http://www.vps.net/blog/tag/hypervisor/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.vps.net/blog</link>
	<description>News from the VPS.net Cloud</description>
	<lastBuildDate>Wed, 16 May 2012 14:27:40 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>Gearing up for SAN Testing</title>
		<link>http://www.vps.net/blog/2008/12/11/gearing-up-for-san-testing/</link>
		<comments>http://www.vps.net/blog/2008/12/11/gearing-up-for-san-testing/#comments</comments>
		<pubDate>Thu, 11 Dec 2008 10:26:28 +0000</pubDate>
		<dc:creator>NullMind</dc:creator>
				<category><![CDATA[The Cloud]]></category>
		<category><![CDATA[Hypervisor]]></category>
		<category><![CDATA[RAID]]></category>
		<category><![CDATA[VPS Cloud]]></category>

		<guid isPermaLink="false">http://vps.net/gearing-up-for-san-testing/</guid>
		<description><![CDATA[The greatest task on any job, has to be product tolerance testing. We are gearing up for a series of tests that will define what sort of SAN we will be using on the VPS Cloud, the actual tests will &#8230; <a href="http://www.vps.net/blog/2008/12/11/gearing-up-for-san-testing/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<a id="dd_start"></a><p>The greatest task on any job, has to be product tolerance testing.</p>
<p>We are gearing up for a series of tests that will define what sort of SAN we will be using on the VPS Cloud, the actual tests will be performed early next week, for now we are setting up the hardware for it, once we have done it, we will post the results here in the bog.</p>
<p>Here is the info on what we are planning &#8230;</p>
<p><span id="more-40"></span>
<p>Basically, our data backend will be a SAN consisting of a huge number of virtual disk targets. The way this works is as follows:</p>
<ul>
<li>Each physical storage server (NAS) breaks down its disk array into smaller logical volumes (using LVM), with 1 volume for each virtual machine</li>
</ul>
<ul>
<li>Each of these volumes is then shared using ATAoE technology across a switched gigabit Ethernet SAN, allowing all the hypervisors to efficiently access any virtual machine&#8217;s disks as required</li>
</ul>
<p>This method allows hypervisors to be fully redundant, allows new storage hosts to be easily added to the SAN, and in the near future will allow full redundancy of storage hosts.</p>
<p><img src="http://vps.net/blog/wp-content/uploads/2009/04/san.jpg" width="351" height="480" alt="san Gearing up for SAN Testing"  title="Gearing up for SAN Testing" /></p>
<p>Initially our plan was to use the scenario 1 .. each NAS would consist of 10TB of Raid 5 data, the backups would then reside on separate 10TB (also Raid 5) clusters, if a volume failed, it could be restored into any of the SAN nodes within seconds.</p>
<p>But we then decided we want even greater availability, by RAID-1 the individual NAS themselves, so if a NAS itself fails, a mirror takes over, no need to restore backups (of course, we will still keep the backup nodes, you can never have enough redundancy) .. this will be the case, but not for the January 31st release, as we are still working on the backend code to make this possible.</p>
<p>By keeping the individual nodes at 10TB each, we expected the I/O to be fine, but doubts started to arise lately, and the last thing we want is a 10TB node to become saturated, so a decision was made to do a test, 10TB Raid 5 nodes vs 5TB Raid 10 .. basically, it&#8217;s the same config for the nodes, it jut happens Raid 10 halves the available data for the same amount of HD&#8217;s than Raid 5 (but you Raid buffs already know that)</p>
<p>Of course, as a tech, I love Raid 10, heck, i&#8217;d even use it on the backup nodes if I could .. but the accounting department would be less than impressed `:), no matter how you look at it, Raid 10 <strong>doubles</strong> the cost of Raid 5, and the more it costs, the more one needs to prove that it&#8217;s justifiable &#8230;</p>
<p>So come back soon, and we will post our tests results once we have them. <img src='http://www.vps.net/blog/wp-includes/images/smilies/icon_wink.gif' alt="icon wink Gearing up for SAN Testing" class='wp-smiley' title="Gearing up for SAN Testing" /> </p>
<a id="dd_end"></a><div class='dd_outer'><div class='dd_inner'><div id='dd_ajax_float'><div class='dd_button_v '><div class='dd-twitter-ajax-load dd-twitter-40'></div><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://www.vps.net/blog/tag/hypervisor/feed/" data-count="vertical" data-text="Hypervisor" data-via="vpsnet" ></a></div><div style='clear:left'></div><div class='dd_button_v '><div class='dd-fblike-ajax-load dd-fblike-40'></div><iframe class="DD_FBLIKE_AJAX_40" src='' height='0' width='0' scrolling='no' frameborder='0' allowTransparency='true'></iframe></div><div style='clear:left'></div><div class='dd_button_v '><div class='dd-linkedin-ajax-load dd-linkedin-40'></div><script type='IN/share' data-url='http://www.vps.net/blog/tag/hypervisor/feed/' data-counter='top'></script></div><div style='clear:left'></div></div></div></div><script type="text/javascript">var dd_offset_from_content = 40; var dd_top_offset_from_content = 10;</script><script type="text/javascript" src="http://www.vps.net/blog/wp-content/plugins/digg-digg/include/../js/diggdigg-floating-bar.js?ver=5.2.6"></script><script type="text/javascript"> jQuery(document).ready(function($) { window.setTimeout('loadTwitter_40()',1000);window.setTimeout('loadFBLike_40()',1000);window.setTimeout('loadLinkedin_40()',1000); }); </script><script type="text/javascript"> function loadTwitter_40(){ jQuery(document).ready(function($) { $('.dd-twitter-40').remove();$.getScript('http://platform.twitter.com/widgets.js'); }); } function loadFBLike_40(){ jQuery(document).ready(function($) { $('.dd-fblike-40').remove();$('.DD_FBLIKE_AJAX_40').attr('width','50');$('.DD_FBLIKE_AJAX_40').attr('height','62');$('.DD_FBLIKE_AJAX_40').attr('src','http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fwww.vps.net%2Fblog%2Ftag%2Fhypervisor%2Ffeed%2F&locale=en_US&layout=box_count&action=like&width=50&height=60&colorscheme=light'); }); } function loadLinkedin_40(){ jQuery(document).ready(function($) { $('.dd-linkedin-40').remove();$.getScript('http://platform.linkedin.com/in.js'); }); }</script>]]></content:encoded>
			<wfw:commentRss>http://www.vps.net/blog/2008/12/11/gearing-up-for-san-testing/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The importance of failure</title>
		<link>http://www.vps.net/blog/2008/12/04/the-importance-of-failure/</link>
		<comments>http://www.vps.net/blog/2008/12/04/the-importance-of-failure/#comments</comments>
		<pubDate>Thu, 04 Dec 2008 15:58:56 +0000</pubDate>
		<dc:creator>NullMind</dc:creator>
				<category><![CDATA[The Cloud]]></category>
		<category><![CDATA[Hypervisor]]></category>
		<category><![CDATA[UK2]]></category>
		<category><![CDATA[VPS Cloud]]></category>

		<guid isPermaLink="false">http://vps.net/the-importance-of-failure/</guid>
		<description><![CDATA[Henry Wadsworth Longfellow once said &#8220;Sometimes we may learn more from a man&#8217;s errors, than from his virtues.&#8221; .. I guess we can adapt that quote to the fact these days we learn more from when the systems fail, than &#8230; <a href="http://www.vps.net/blog/2008/12/04/the-importance-of-failure/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><em>Henry Wadsworth Longfellow</em> once said &#8220;<em>Sometimes we may learn more from a man&#8217;s errors, than from his virtues.</em>&#8221; .. I guess we can adapt that quote to the fact these days we learn more from when the systems fail, than when they are working fine .. such was the case today.</p>
<p>We had done some tests on the VPS Cloud self healing measure by shutting down a Hypervisor&#8217;s services, and always, the VPS nodes residing there, would automatically boot up on a different one in the Cloud &#8230; always below 40 seconds (yes, 40 seconds)&#8230; but we decided that was no fun, it was time to go to the datacenter and pull some cables &#8230; yes .. <strong>I love my job</strong> <img src='http://www.vps.net/blog/wp-includes/images/smilies/icon_smile.gif' alt="icon smile The importance of failure" class='wp-smiley' title="The importance of failure" /> </p>
<p><span id="more-23"></span>
<p>So there we are, myself and Paul (head of IT @ UK2), we located the Hypervisor we wanted to test, enabled monitors for the main NIC and the VPS&#8217;s IP&#8217;s &#8230; all systems go, the anticipation of success in the air, our palms sweating, giggling like schoolgirls .. and &#8220;<strong>click</strong>&#8221; &#8230; we pulled the cable &#8230;</p>
<p><img src="http://vps.net/blog/wp-content/uploads/2009/04/200812041627.jpg" width="300" height="228" alt="200812041627 The importance of failure"  title="The importance of failure" /></p>
<p>We then started counting till the VPS&#8217;s where back up elsewhere in the VPS Cloud &#8230; 1,2,3 &#8230; 40 .. ok, anytime now &#8230; 60 &#8230; 90 ? .. wait a second &#8230; what is wrong here ?</p>
<p>3 minutes later, we looked at each other in horror &#8230; the unthinkable happened &#8230; the VPS Cloud self healing feature .. one of the cornerstones of our offer .. had <strong>failed</strong> !!!!<br />
<img src="http://vps.net/blog/wp-content/uploads/2009/04/200812041639.jpg" width="126" height="168" alt="200812041639 The importance of failure"  title="The importance of failure" /></p>
<p>Luckily this is still our beta testing .. but what had happen ? it always worked when we turned the services down, why did it fail when we pulled a NIC cable instead ?, what was the difference ? .. and so it began todays &#8220;must know&#8221; task.</p>
<p>We started by looking at the logs, nothing strange there, actually, nothing there at all for the past few minutes .. thats when it hit .. there was indeed NOTHING in there, the logs show no downtime had ben detected for a Hypervisor.. we returned to the Admin CP, and sure enough .. it still detected that system as being up .. but how ?</p>
<p>Well, as so often is the case, you look into a few hundred lines of code, until you decide to instead look at the obvious &#8230; could the internal monitor daemon failed &#8230; and if it failed .. why where we not notified ? .. wait .. what monitors the monitor .. ? ..<img src="http://vps.net/blog/wp-content/uploads/2009/04/200812041648.jpg" width="150" height="124" alt="200812041648 The importance of failure"  title="The importance of failure" /></p>
<p>Simple schoolboy error, we had all sorts of monitors, bells and whistles .. if anything in the VPS Cloud fails, it gets detected within 5 seconds or less .. a true example of monitoring excellence &#8230; but what we forgot was .. what if the monitor fails ?</p>
<p>This brings me back to the post topic, the importance of failure .. there was no difference on our test .. pulling the cable or shutting down the services all lead to the VPS Cloud monitor to kick in and do it&#8217;s job &#8230; it was just a coincidence that this time the monitor daemon had hang, had the failure not happen, this simply oversight could have cause trouble later on, so yes, failure is good .. as long as it happens during beta testing <img src='http://www.vps.net/blog/wp-includes/images/smilies/icon_wink.gif' alt="icon wink The importance of failure" class='wp-smiley' title="The importance of failure" /> .</p>
<p>So we rewrote some of the daemon to be more robust (we found what caused it to fail and fixed it) and implemented extra monitoring procedures that now we monitor the monitor too <img src='http://www.vps.net/blog/wp-includes/images/smilies/icon_wink.gif' alt="icon wink The importance of failure" class='wp-smiley' title="The importance of failure" /> </p>
<p>This emphasizes the importance of beta testing and fault simulation, so often we see companies go live with untested ground breaking products and have a miserable first quarter or two of constant failure and bug fixing, many times driving them to closure .. if not properly tested, it&#8217; the small things that will get you in the end.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.vps.net/blog/2008/12/04/the-importance-of-failure/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

