|
Please rest assured that we dealt with this with the utmost urgency and let me re-iterate that maintaining our reputation for uptime and reliability is of the highest concern to us. We understand that, for many of you, your online business is your livelihood.
For those who are interested, the technical explanation…
The global routing table is a ‘map’ of each possible destination on the internet. Every large network operator (such as ourselves, or the ISP you use to connect to the internet) holds a copy of this, or multiple copies in our case. This is what enables every computer on the internet to reach every other computer.
Over the past two decades the routing table has been increasing in size, due to new ipv4 addresses being used and existing ipv4 address ranges being split (meaning that 2 consecutive ranges might have different paths). Today it hit 512,000 routes. This is a magic number as it’s an inbuilt limit in many common routers and switches.
We had pre-empted this. Most of our routers/switches already have a higher limit and we have recently spent £250,000 on network upgrades to improve the rest of our network. These had not yet been installed as we believed we had room to spare. However, last night there was a sudden increase in the number of routes being announced to the world and at 9AM we hit the limit.
Due to human factors it took us approximately half an hour to find the cause of the issue. At that point we applied a fix on the only router we believed was affected. This took effect after a reboot (which caused approximately 60 seconds of packet loss as indicated on the graph) and the majority of people who could not access our network were then able to. However some users were still reporting problems so we continued to investigate. We believed the issue may lie elsewhere as customers were also reporting issues reaching websites such as eBay and Skype but, despite the lack of any log entries to indicate, it turns out another of our Cisco routers had also hit its routing limit. The same configuration change was applied to that router and, at that point, the remaining people still having problems accessing their site were now able to again.
It seems many other high profile ISPs also suffered the same issue today and most have now fixed their own networks.
Over the next 2 weeks we will be replacing all the affected routers with brand new Juniper devices which can hold enough routes to cover us for the next decade.
This issue only affected a small preportion of people accessing our network, and most users would not have seen any disruption. Nonetheless I would like to express my sincere apologies for the issues that some of you faced. If you have any queries please send a support ticket with the subject “FAO: Adam Smith” or give me a call on 01628 200161.
Kind Regards,
Adam Smith
|