yesterday another conference at Codenode with Dan North (Twitter – LinkedIn – Website) who spoke about DevOps and Developers. Unfortunately I am lacking time, so I couldn’t write this post yesterday and also today I have to go fast.
The talk started with an example: imagine a company that treats its customer by giving no consistency in their products, no clue when something breaks, no warning that things are about to break and that it is blaming customers when things go wrong. He said that all of this has a definition: “Provided without warranty”.
This is basically how Dev treats Ops. DevOps starts with Dev pushing on Ops. And Ops has to resist by putting effort to compromise their governance, assurance, audits, compliance, control processes and structures.
The point is that Ops is responsible for runtime operations, recovery from failures, restoration of a working state, SLAs, diagnosis and Business Continuity. But the very first part of the talk was mostly centred on making us aware of the difficulties of the Ops role, for example when we speak about “autonomy” we think that devs should be able to use the right tool for accomplish a task. But there are dozens of tools for any type (think about how many versioning control systems exist, or how many containers) and someone has to install, monitor, and fix all of them and all their install. So we arrive to the big question “how to resolve local autonomy and global consistency” or the Spotify problem.
He then spoke about Contextual Consistency: let’s say that we need to address that by having a person that attends all the design meeting. He cannot be in more than one place at the same time, so this can create big delays on all the teams. Now let’s say that, instead that being in all meeting, this person just revise after the meeting the decision taken by the people who attended the meeting. In the second case people will start understanding how things should be done and they will growing a local consistency and a global consistency at one point, understanding how things may probably be done.
Then he spoke about incidents management and questions we have to answer when an incident occurs: what happened? Who is impacted? How do we fix it? But the real question is: how could we reduce the impact of this?
Another important part related devops is to have a good “Captain’s Log“, meaning that our logs should follow some rules: they should be minimal when everything is fine and verbose when something went wrong, and they should “let others figure out what is happening“, and not only the developer.
A log message should contain: a timestamp for humans and machines (you should understand when something happened even if you are in another country with another timing), a unique distributed correlation id to track which components processed it, the cause, the whole cause, nothing more than the cause and an answer to the three questions we have to answer when an incident occurs.
He then spent a bit of time in speaking about the benefits and pros of having automated deployment and to have automated steps for the deployment process (the two things are slightly different, because the process could involve other steps that not deploying the code, like for example generating docs like latest changelist or having human approvals).
He spoke also about two important aspects: active monitoring and alarms (and the pattern to having healthchecks not involving ELBs – for example using UDP messages/hearthbeats, they can carry 1500 bytes that are lot of information and he displayed an example of a good packet which had a name, the app it belongs to and the ones that requires its app, the address, the hearthbeat interval and the interval before being considered unhealthy, some configurations like the revision, who deployed it and when, the status of the instance and references like the url to check the status and to check the config) and what User Experience is.
He then finished with a discussion on the roles of devs and ops and what they should achieve. Meaning that ops should also train devs to be a bit more devops in first place.