The DiRT on Chaos Engineering at Google • Jason Cahoon • GOTO 2021

32:54 423 views 94% Published 4 months ago

This presentation was recorded at GOTOpia February 2021. #GOTOcon #GOTOpia

Jason Cahoon - Site Reliability Engineer at Google

A shallow dive into 15 years of Chaos Engineering at Google, the lessons we've learned performing many thousands of disaster tests on production systems, and some tips on how to approach getting started with Chaos Engineering at your own [...]

00:00 Intro
01:02 DiRT: Disaster Resiliency Testing
02:53 Why?
04:38 What we test?
06:01 Testing themes
10:01 Practical vs theoretical
12:31 How?
15:12 Picking what to test
16:29 Steps for bootstrapping a disaster testing program
18:25 Testing production vs testin in production
20:16 Really, you're breaking production though?!
23:00 Reporting on results
24:24 What have we learned?
26:55 Test example: Run at service level
28:51 Test example: Toggle the O-N / O-F-F discriminator
30:25 Test example: Run without dependencies
31:53 Test example: Hacked!

Download slides and read the full abstract here:
#ChaosEngineering #DiRT #Resilience #Observability #BuildingResilience #Resiliency #Programming #SRE #Programming #GameDay #DigitalTransformation #Reliability #RPC #RPCFaultInjection

Looking for a unique learning experience?
Attend the next GOTO conference near you! Get your ticket at

SUBSCRIBE TO OUR CHANNEL - new videos posted almost daily.

Watch on YouTube