The superstorm endangered two of Facebooks data centers, each carrying tens of terabits of traffic.
He said the experience made the company realize it might not be as fortunate the next time.
We had built up enough redundancy over the years that we weathered the storm, unintended, he said.
We really came pretty close for us.
The scale at which Facebook operates apparently compounds its resiliency challenges.
So…every day is chock full of lot of scalability problems.
To be honest things didnt go all that well the first few times we did this, Parikh said.
We learned a lot and this was exactly the goal of the drill.
The major lesson learned: traffic management load balancing is really hard.
Parikh recalls that other Facebook leaders didnt think he would actually do it.
I was having coffee with a colleague just before the first drill.
I told him, Theres only one way to find out if it works.
However, users didnt appear to notice.
If youre an engineer and you see a graph like this, three things come to mind.
This is what we got much, much better, Parikh said.
So we strive to make our graphs for our drill exercises to look like this graph.
you’re gonna wanna push yourself to an uncomfortable place to get better.
Source:ieee
Read More
source: www.techworm.net