With the wide adoption of micro-services and large-scale distributed systems, architectures have grown increasingly complex and hard to understand. Worse, the software systems running them have become extremely difficult to debug and test, increasing the risk of outages. With these new challenges, new tools are required and since failures have become more and more chaotic in nature, we must turn to chaos engineering in order to reveal failures before they become outages.
In this talk, we will deep dive into availability, reliability and large-scale architectures and make an introduction to chaos engineering, a discipline that promotes breaking things on purpose in order to learn how to build more resilient systems. This will be followed by showcasing the audience how to start practicing chaos engineering on the AWS cloud.
The speaker will walk through the tools and methods they can use to inject failures in their architecture in order to make them more resilient to failure.
Adrian has over 15 years of experience in the IT industry, having worked as a software and system engineer, backend, web and mobile developer and part of DevOps teams where his focus has been on cloud infrastructure and site reliability, writing application software, deploying servers and managing large-scale architectures. The truth is that Adrian loves breaking stuff - Chaos is his thing :) Adrian frequently speaks at conferences and community meetups, and is actively blogging at https://medium.com/@adhorn.