Netflix Unleashes Source for Chaos Monkey


Today, via a posting on their Tech Blog, Netflix announced the long awaited release of their failure inducing “Chaos Monkey” tool:

Chaos Monkey is a service which runs in the Amazon Web Services (AWS) that seeks out Auto Scaling Groups (ASGs) and terminates instances (virtual machines) per group. The software design is flexible enough to work with other cloud providers or instance groupings and can be enhanced to add that support. The service has a configurable schedule that, by default, runs on non-holiday weekdays between 9am and 3pm. In most cases, we have designed our applications to continue working when an instance goes offline, but in those special cases that they don’t, we want to make sure there are people around to resolve and learn from any problems. With this in mind, Chaos Monkey only runs within a limited set of hours with the intent that engineers will be alert and able to respond.

In other words, Chaos Monkey is a tool used to simulate failures in “cloud” services so that the operators can be better prepared for unexpected outages. By inducing failures in the system, developers are able to implement fixes and contingencies on their own terms, rather than waiting for a serious problem to develop before being able to deploy countermeasures.

Netflix claims that Chaos Monkey has been used to cause over 65,000 failures in their system over the last year, and while most went by without issue, a few of them brought issues to light which Netflix engineers were able to repair so they won’t cause outages down the road.

Netflix Open Source

Netflix has released the source for Chaos Monkey under the Apache 2.0 license on the popular social programming site GitHub. GitHub is a service designed to make it easier for developers to cooperatively work on projects which are managed by the Git revision control system. This allows others to create forks of existing projects and suggest fixes and updates to the upstream developers to further develop the project.

SimianArmy on GitHub

SimianArmy, the parent project under which Chaos Monkey is released, is just one entry in Netflix’sĀ considerableĀ open source portfolio. Netflix’s entire collection of open source tools can be found under their rather cleverly branded page on GitHub, designed to look like their own site.

Source | Netflix Tech Blog

About Tom Nardi

Tom is a Network Engineer with focus on GNU/Linux and open source software. He is a frequent submitter to "2600", and maintains a personal site of his projects and areas of research at: .