Previously on my blog: in the previous post we spoke about circuit breakers, communication, point to point vs brokers, robustness and antifragile, stressors and consumer driven contracts.
In the first post we spoke about what are microservices, pros and cons of the network in the architecture (fallacies of distributing systems, CAP theorem…) and the problem of changes, but most of the discussion was related the Domain Driven Design.
The third and last day. We started speaking about the problem of service discovery: if your services are speaking each other, service A must know how to reach service B. The usual strategies are: use a load balancer (you know where the load balancer is, and you send traffic to it, and he knows where the service instances are), use a service discovery proxy, use a broker or direct connect. Load balancers have a strategy to forward your requests (usually round robin, or a smarter way that takes care of load or health check), but still a mechanical one, meaning that you cannot ask, for example, a load balancer to “send it to the faster node” based on some data of the previous computations.
You can achieve that behaviour by using a Service Discovery Registry that keeps the list of services available (based on type or tags) and by implementing the load balancer strategy on the service itself. So the service A periodically query the registry for the list of services and for any request, he choose itself the service to which it wants to send the request. There are several tools in this field: Consul.io is a DNS Server (I think we used this client for our lab), Eureka is a service registry, ETCD is a distributed key-value store, zookeper is a distributed store for configurations.
We also spent a bit of time speaking about Vault (webpage – github), a tool similar to a configuration server, but only for credentials key and secrets, and about API Gateways, like Zuul or Apigee (sold as SAAS), that lets you have an entry point for your service (in HTTP, usually) so that you can hide all the rest of your architecture in the backend. Apigee (I am not sure about Zuul) handles also API KEYs, limiting the access to the api up to a defined limit for a specific key (let’s say a user can do only 5000 requests…).
Another used techniques related to gateway is the Backend For Frontend: let’s say that you api gateway is generic but you have specific type of clients (browsers, mobiles, devices…) that prefers to make requests in different ways (let’s say that, for retrieving data from a mobile, you want to do 5 small requests, for the browser one big request including all data…). One Backend For Frontend is a proxy specific for a client type that basically accepts all your requests, produces the response by querying the frontend, and divides the response in the required responses for the specific client. You can read more about this topic here, for example.
Another interesting pattern for microservices is the Pipeline Coordinator: if you have a web of services you may have A speaking with B, that speaks with C and D, D speaking with E and so on. This we already discussed that is fragile, chaotic and in one word, BAD. It may be better to just have one backend retrieve the information from all the other services: if D needs E to compute, the (pipeline) coordinator may invoke E, retrieve the data, pass it to D and retrieve its data and so on. If D is just aggregating data from E, the coordinator may call in whatever order it wants. In this way the coordinator becomes basically a State Machine. I think an example can be Oozie, even if I am not really sure.
Time to speak about messaging patterns. We have two big groups: SEDA (Stage Event Driven Architecture) for which every service has a channel/queue in which it asynchronously put a message without knowing what happens next to this message. Opposed to this there is the Communicating Sequential Process (CSP), meaning that a process/service A must know the receiver B to which it wants to communicate. It then sends a message to the queue/mailbox of B to coordinate the message exchanges. When B finds this message, it replies that it is ready, and a channel based on mailboxes (queues) is created, for which writing is blocked till the reader reads, and reading is synchronising the two processes. It is similar to an Actor system, but with some small differences in the synchronisation.
Event systems are different. An Event means that something happened in the system, it is the reason why often event types are action in the past tense (“order-placed“). They usually have specific attributes: type, time, id, payload… The idea is that events are broadcasted, and components who are interested can subscribe to it and ALL of them will receive a copy of ALL the events when they will be broadcasted.
Compared to RPC, in which the message has a more imperative meaning (A says to B something like “do this for me!“), an Event exchange is more declarative (“this thing is just happened“). A components fire an event and then it doesn’t care about consequences and future computations (fire-and-forget). Receivers, too, may ignore some events or messages. The message, in this pattern, is only the envelope for the event. A logical consequence of this pattern is that EVENTS CAN BE LOST!
Tools related to this pattern are: AWS Kinesis, Kafka or Photon. Those tools store all the events and they offer also the functionality to add a new consumer and have it receive ALL the events since the very beginning of the publishing, so that new consumers can have a new state based on ALL the available events. This is a slightly different pattern called Eventually Consistent System (more details here). This leads to the Event Sourcing (here for example) pattern for which you are publishing a new event for any change of data, so that you can easily rebuild your state by rerunning all the state changes. There are lot of documents and talk by Greg Young about CQRS&Event Sourcing around on internet, like this document.
Last topic of the course was Containers. Containers are useful for the runtime (less expensive than a virtual machine), for the development process (to replicate an environment) and for testing, but also for having exact images for the deployment. Container are archive with a nice structure that let you define exactly what you need to run and its environment. The container create a virtual environment for the applications and run them inside this virtual environment. So it is similar to a virtual machine, the difference is that the virtual machine can virtualise everything, operating system included, while a container uses the same kernel of the physical machine, it reserve a bit of space for the container resulting in an optimised usage (in a virtual machine all the resources you reserve for the virtual machine are booked for its computation, meaning that what you don’t use is wasted, in a container this is not true).
What is happening in the docker is not visible from the outside world, but you can still map docker ports to physical ports for communication. Docker, one of those tools, has a very nice way to define the content of an image and it comes with a functionality called docker compose that is useful to aggregate different images with different functionalities. Moreover, there is an open registry here in which you can find the dockerized images of lot of applications ready to be used in your machine (in our labs we used kafka, rabbitmq and consul…). There are other tools like docker with specific features: kubernetes for orchestratory scheduling (it can easily scale images), Weave, Rancher, Kube, Cloud Foundry, Mesos and Marathon.
This ended the course. Stay tuned!