The worst outage I never caused
Start time | 16:35 |
---|---|
End time | 17:00 |
Countdown link | Open timer |
In 2017 I came one keypress from causing Google's main backbone to largely fall off the Internet. This is the story of how we used that incident as a learning opportunity, how a lack of buy-in hindered further improvements, and how an existing toolkit of python libraries allowed testing and validation tools to be quickly built, preventing any chance of a recurrence.
In 2017 I came one keypress from causing Google's main backbone to largely fall off the Internet. This is the story of how we used that incident as a learning opportunity, how a lack of buy-in hindered further improvements, and how an existing toolkit of python libraries allowed testing and validation tools to be quickly built, preventing any chance of a recurrence.
Julien is a Senior Site Reliability Engineer at Google Sydney, from 2011 to 2018 he worked on Google's production networks, focusing on Internet routing & interconnection. When not at work he does things like designing custom embedded Linux machines & modernising frequency distribution systems.
He is also the current Secretary of Linux Australia, the parent organisation for PyCon AU.