If you can't view this email please click here
 
Tsohost Logo
Follow Us on Facebook Follow Us on Youtube
Follow Us on Google+ Follow Us on Twitter
 
Dear Valued Customer

My name is Adam Smith.  If you’ve not read one of my emails before, I’m the Technical Director at Tsohost and the person responsible if things go wrong.

 

At Tsohost we are aware that our success over the past 11 years is due almost entirely to customer referral and word of mouth, and we know that our high service level is the primary reason for this.  Therefore, when we fail to meet those service levels, I feel it is important to tell you why.

 

Today (Tuesday 12th August) we experienced a network routing issue between approximately 9AM and 11:30AM.  This meant that some visitors in the UK and elsewhere in the world were unable to reach our network, including customer websites, our own website, email services and other services provided by Tsohost.  It’s important to point out that this issue was sporadic and only affected a small number of people.  For the majority of the world, services were accessible throughout.

 

Below is the graph of one of our many connections to the public internet to demonstrate.  You will see a small drop in traffic whilst the issue was occurring and 2 additional ‘blips’ which are due to the router reloads (more details below) but otherwise traffic levels were only slightly affected. The arc you see is unrelated and is part of our daily traffic flow, which naturally drops overnight.

Please rest assured that we dealt with this with the utmost urgency and let me re-iterate that maintaining our reputation for uptime and reliability is of the highest concern to us.  We understand that, for many of you, your online business is your livelihood.

 

For those who are interested, the technical explanation…


The global routing table is a ‘map’ of each possible destination on the internet.  Every large network operator (such as ourselves, or the ISP you use to connect to the internet) holds a copy of this, or multiple copies in our case.  This is what enables every computer on the internet to reach every other computer.

 

Over the past two decades the routing table has been increasing in size, due to new ipv4 addresses being used and existing ipv4 address ranges being split (meaning that 2 consecutive ranges might have different paths).  Today it hit 512,000 routes.  This is a magic number as it’s an inbuilt limit in many common routers and switches.

 

We had pre-empted this.  Most of our routers/switches already have a higher limit and we have recently spent £250,000 on network upgrades to improve the rest of our network.  These had not yet been installed as we believed we had room to spare.  However, last night there was a sudden increase in the number of routes being announced to the world and at 9AM we hit the limit.

 

Due to human factors it took us approximately half an hour to find the cause of the issue.  At that point we applied a fix on the only router we believed was affected.  This took effect after a reboot (which caused approximately 60 seconds of packet loss as indicated on the graph) and the majority of people who could not access our network were then able to.  However some users were still reporting problems so we continued to investigate.  We believed the issue may lie elsewhere as customers were also reporting issues reaching websites such as eBay and Skype but, despite the lack of any log entries to indicate, it turns out another of our Cisco routers had also hit its routing limit.  The same configuration change was applied to that router and, at that point, the remaining people still having problems accessing their site were now able to again.

 

It seems many other high profile ISPs also suffered the same issue today and most have now fixed their own networks.

 

Over the next 2 weeks we will be replacing all the affected routers with brand new Juniper devices which can hold enough routes to cover us for the next decade.

 

This issue only affected a small preportion of people accessing our network, and most users would not have seen any disruption. Nonetheless I would like to express my sincere apologies for the issues that some of you faced.  If you have any queries please send a support ticket with the subject “FAO:  Adam Smith” or give me a call on 01628 200161.

 

Kind Regards,

Adam Smith

 
As a subscriber and/or customer of Tsohost, we want to ensure you are always up to date with our latest news. If for any reason you no longer wish to receive our monthly correspondence, you can unsubscribe by clicking here.