[:en]
What happened?
Last 8th, from about 10 AM UTC, we started to receive different error reports from our customers: one couldn’t access his Github pages, another couldn’t manage his infrastructure with terraform, the one over there couldn’t build his python application …
When we posted these errors on our shared channels we realized that something big was going on because, as they say: one is an accident, two is a coincidence, three is a pattern. So we started digging around and soon discovered that one of the world’s leading CDNs was having problems.
What is a CDN?
A CDN (Content Delivery Network) is a globally distributed network of points of presence (PoPs) designed to get content to customers faster and more reliably. Whether we are aware of it or not, we all interact with a CDN on a daily basis: when we read the news on our favorite portal, when we make an online purchase, when we watch our favorite series on Netflix or when we take a look at our social media feeds. The CDN is the “invisible helper” behind that, in the vast majority of cases, excellent experience, because it helps minimize load times and latency in content delivery by physically reducing the distance between the content provider and its users.
How to react?
So what can you do when a provider through which a very significant percentage of the world’s web traffic passes is down?
TL;DR: Well, not much, really.
If you are a direct client of a CDN and you cannot afford to be offline in a similar circumstance, you have no choice but to implement a multi-CDN strategy (i.e. distribute your traffic across 2 or more networks to achieve redundancy and high availability). But this option is not affordable for everyone, besides adding extra configuration and complexity to your platform.
But if you are an indirect client (i.e. you use a service accessed through a CDN), your alternatives are rather null: wait for the service you use to implement a workaround or for the source of the problem to be solved (fortunately, in this case the incident was solved in less than 1 hour). The best thing to do is to realize as soon as possible that the problem is someone else’s problem so as not to waste time investigating what you have broken on your side …
Can it happen again?
Well, assuming that it is not the first time it happens and taking into account the nature of today’s internet, we can take it for sure.
So if you want to be prepared for the occasion and plan the necessary steps to mitigate the impact (either through a multi-CDN strategy or looking for alternatives to services that may be affected), do not hesitate to contact us.
See you on the next fall … I mean… on the next post.