
Skype Explains Outage In-Depth
Six days afterwards an extended outage left its network inaccessible to many users for near 24 hours, Skype CIO Lars Rabbe has published a post-mortem write-up of the situation.
Essentially, a server overload set into motion a chain of events that led to a perfect storm of problems and issues that impacted the very core of the P2P network that keeps Skype running. As a result, the service was down for many users for up to 24 hours.
In his blog post, Rabbe describes the sequence of circumstances that led to the outage. The main point of breakdown -- aside from the initial overload of a cluster of support servers -- centered around the Skype for Windows client. Instead of correctly processing the delayed response from the overloaded servers, Skype for Windows version 5.0.0.152 would instead crash.
The latest version of Skype for Windows
The latest version of Skype for Windows, version 5.0.0.156, the 4.0 versions of Skype for Windows, Skype for Mac, Skype for iPhone, Skype on your TV and Skype Connect/Skype Manager were not impacted by this first wave of issues.
The problem, unfortunately, was that in broad outline 50% of all Skype users across the globe were using the 5.0.0.152 version of Skype for Windows. This was the first stable release of Skype 5, released in October. The updated version of Skype for Windows was released on December 14, nevertheless unless a user happened to manually check for the update or download the latest version, chances are, he or she was running the crashtastic Windows client. Rabbe says that program crashes caused in broad outline 40% of customers running the buggy version of Skype for Windows to fail -- that is, 20% of Skype customers in use failed because of this issue with the older version of the software.
This is where the perfect storm components start to come at the same time. Those failed customers represented 25 to 30% of the openly available "supernodes." Essentially, a supernode is a connection point that can as well help funnel traffic for other users. The way that peer-to-peer VoIP (Voice over Internet Protocol) networks like Skype work is that a client must connect to a supernode in order to make a connection, send voice or video data or exchange instant messages. By default, every Skype client can be a supernode, depending on your firewall settings and bandwidth capacity. If your Skype client crashed and you were a supernode, the number of available connection points for other users just dropped.
Rabbe writes, "The failure of 25â?"30% of supernodes in the P2P network resulted in an increased load on the remaining supernodes. During we expect this kind of increase in the instance of a failure, a significant proportion of users were as well restarting crashed Windows customers at this time. This massively increased the load as they reconnected to the peer-to-peer cloud."
As luck would have it, all of this occurred just earlier the usual daily peak in usage. That meant that traffic to the remaining supernodes "was about 100 times what would as a general rule be expected at the time of day." To furthermore complicate matters, this additional load triggered built-in-protection mechanisms, that pursuant to this agreement ordinary circumstances, could indicate something beyond just a sudden drop in available supernodes. These triggers created what amounted to a positive feedback loop, where overloaded sueprnodes shut themselves off, which in turn overloaded other supernodes, causing them to shut themselves off in short-on. This was the event that really took down Skype for the majority of users -- whether you were using Windows or not.
Lessons LearnedThis Skype outage and Rabbe's detailed explanation are interesting in that they highlight what â?" for all intents and purposes â?" was a fluke. Had the Windows client not had the propensity to crash and had the time of the outage not occurred while peak usage and just ahead of a major holiday, the situation likely would have been much different.
This outage was as well an interesting look at how the Skype ecosystem operates. Skype continues to be unparalleled amongst VOiP providers partly because of its P2P roots. This system is an implicit part of why Skype works so so then, but pursuant to this agreement the right circumstances, it can as well provide its own in a class by itself set of problems.
- · Rackspace debuts OpenStack cloud servers
- · America's broadband adoption challenges
- · EPAM Systems Leverages the Cloud to Enhance Its Global Delivery Model With Nimbula Director
- · Telcom & Data intros emergency VOIP phones
- · Lorton Data Announces Partnership with Krengeltech Through A-Qua⢠Integration into DocuMailer
