Monday and Tuesday I was at the µCon, the microservices conferences hosted in Codenode. I was so excited that in 2 days I took 19 pages of notes (and I am a person that writes with very small fonts). I will try to go through all the talks that I attended.
The first talk was “How to be WRONG with Microservices, and still WIN” by Russ Miles, CEO of ChaosIQ.io (website – linkedin – twitter), that started by playing guitar for then discussing about Chaos Engineering. The concept is that “we [developer] are always wrong” because basically we don’t know what we do and our clients don’t know what they want. But being wrong bring blames and disappointment, we are scary and we try to cover our back. At the same time we try to achieve two factors: good velocity in coding new features and striving for reliability; those two factors are not in conflict: faster means also more reactive that brings more reliability.
On top of that, software have Dark Debt, that is different from tech debt because it is not conscious. Dark debt is whatever is missing in the software and that we are not aware of. There is no trade off or anticipated action or prevention that can save us from dark debt. All these factors bring us to the conclusion that we have to face to be wrong, and the only way to do that is to “deliberately practice to be wrong”.
Companies have to embrace a ZERO BLAME culture in which we don’t care who made mistakes, but we accept that mistakes happen. So we have to invest in resilience to mistakes: never let an outage go to waste! If post mortem learning should always happen because it is a good practice, pre mortem is what is called “Chaos Engineering”. Basically it is all about doing game days in which we do automatic chaos experiment with the intent of fixing (fake) issues to learn from them, improve our system and increase the robustness, so basically it is all about being wrong proactively.
Final suggestion is to do game day before the retrospective, because Russ considers the retrospective usually boring (he didn’t work in my company, though) and people get emotional in failure situation, so they will put more observations during retro.
Second talk was “Cultivating a Microservice Culture Via Tooling” by Tom Vance (linkedin – twitter) spoke about the advantages of using and developing tools for standardise some processes in a company. For example he created a script for generating a new project from scratch, with a github repo included, or for starting a developed service. For distributing and updating his tool he used NPM. He used also a library for having standard logging and access to databases. So the full point was that developers need to focus on business and not on doing repetitive dumb tasks.
Third talk was “Distributed System Reliability Through Chaos Engineering” by Sylvain Hellegouarch, CTO at ChaosIQ.io (linkedin – twitter). Again the topic was Chaos Engineering that gives us better decision making. We basically would like to have actionable “what if…” experiments to collect data, like “what if latency increases by 500 ms?” (to collect data on resilience and performance) or “what if the internal certificate expires?” (for security) or “what if we lose a node?” (for availability and reliability).
An experiment is basically a protocol in which we (1) define a baseline, (2) impact the system, (3) check to establish if the baseline has deviated. The actionable/experiment should have an automated execution and be part of the engineering routine. Sylvain then started speaking about Chaos Toolkit, an open source tool for Chaos Enginnering distributed under Apache V2 License with an active and friendly community. It is part of CNCF (Cloud Native Interactive Landscape). Chaos toolkit has a nice way to declare experiments and a very good reporting system that shows graphs of contribution about the areas that the experiment covers. It can setup also roolback actions to revert the effects of an experiment (kubernetes for example does NOT need any roolback if you kill a service, because it restarts it automagically). The reports can go to slack or to a monitoring tool and the toolkit has several nice integrations
For today it is enough (and I cover only 3 talks out of 18). Stay tuned!