I just finished my third day of training. This post close the content of the course. Something has been skipped, I also have two related books in PDF, they contain the slides, texts of the LABs and everything else that could be useful. Indeed after this course we could try the related certification, but I am not quite sure to do that, because I should pay it by myself. So, let’s start.
You can use Route 53 for having green/blue environments.
Cloud Formation: its template has a format not easy to master, anyway you can have building blocks deployed and used instead of redeploy every block once per stack. Elements/resources in the template could be describe regardless of their order: it is AWS that figures out how to initialize them and in which order. Anyway it is still possible to put constraints in the startup by adding “depends on” properties. This is especially useful for services that require, for example, internet during the boot.
If you have even more specific need, like an application should be up and running before another resource could be started, I can have the main resource to set a flag when it is up, and set a property “wait for” for the depending resource.
Moreover I can have a “Creation Policy” property in the template for defining rules like “wait for 3 web server to be up and working“.
I can parametrise the startup of a stack with “Parameter” elements: I can parametrise AMIs or type of EC2. Doing so will require me to input those information when I will run the template in Cloud Formation (text area, checkbox…).
Other that user inputs (Parameters), I can specify in the template some “Mappings“. A mapping can then be invoked with the required parameters in order to get the specified value (like a multi-key map in a normal programming language). You can define also “Condition“s (booleans computed with logical operator applied to other Parameters) and “Output” or “Export“ed values, that are values of a deployed stack that a stack created by Cloud Formation can use/reference when it reference the associated external stack. Let’s say that we run a lambda with a template, we can “Output” its URL and reference it in a stack when we create it with Cloud Formation.
Amazon Beanstalk is a service that let developers deploy applications without knowing how the AWS architecture works. This means that you must have an architecture working on AWS, and a developer just deploy the package of the application in it. Beanstalk let the user also switch the DNS (Route 53) to one environment if you want to deploy + verify that is working + switching the live environment to the new stack.
Amazon OpsWork is an utility service for deploying/defining layers. It gives you back also the Cloud Formation template. OpsWork has several built in template, but you can of course define custom ones.
Last but not least, Run Command gives you the possibility to run shell commands to a set of tagged EC2 instances. You can, in such way, update the OS of multiple EC2 instances all togheter. To fix or update a machine you can use tools like Chef, Puppets, Ansible…
WEB SCALE STORAGE suggestions:
- make sure that you use S3 for static files (S3 can store files of dimension up to 5 Gb)
- make sure you are using S3 buckets close (region) to your clients
- Amazon decide which physical storage to use based on the prefix of your object key. This means that it is not a good idea to have a group of object starting with the same prefix: they will end up being stored on the same ph. storage, reducing the parallelisation and increasing latency. Better to use hash as beginning of the key
- Use cache wherever possible (CloudFront). The best stack is Route 53 pointing to CloudFront pointing to ELB
- DynamoDB does NOT support transactions but it has a feature that executes updates only with specific conditions (like “update price IF it still has this value”), and it fails otherwise. You can then retry in case of failure.
- You can setup multiple indexes on your DynamoDB schema, and even if joins are not available, you can still select data based on the value of the index, instead of running a scan.
- Amazon RDS has different flavours of DB. Usually MySQL has a frontend and a backend: Amazon implementation of MySQL doesn’t use InnoDB as backend, instead they have their own solution called Aurora.
AWS Well-Architected Framework defines the 5 (initially 4: the fifth was introduced recently) pillars for a good architecture: Security, Reliability, Performance, Cost Optimization and the recent one, Operational Excellence. It is a good idea to consider them when you are designing your architecture, they help you not to forget important details
SECURITY: comes with isolation, encryption, security groups, ACL… Security must be made at all levels (APT tables, routing tables…). Also traceability and the responsibility model are important. Amazon provides certificates (through Amazon Certificate Management) for free if you use them with ELB or Cloud Front.
Amazon Sealed is a service that Amazon offers you to protect your own hardware on premise from DDoS attacks.
Web Application Filtering (WAF) can detect SQL Injections and prevent unwanted access.
Amazon Inspector analise your stack/account and gives you a report for vulnerabilities. You can find that most of them are related to an old version of the OS and use Run Command to do folkloristic things like updating the OS.
You can set the static files in your bucket to be private (not accessible from external users). But you can then decide to generate Signed Urls to access them: a Signed Url grants access to an object on S3 for a limited amount of time. You need anyway an API call to Amazon to generate a single signed url, that is basically an access (in read/write) to a specific S3 bucket + key.
Another way to access S3 private buckets (but available only for CloudFront) is Origin Identity (OID).
Regarding encryption the basic behaviour in AWS is to use symmetrical keys. If you need to encrypt your file to be store in S3, you encrypt it by yourself, Amazon encrypt for you your key with a master key (the user will never know the value of this master key) and you will put both the encrypted file and encrypted key in the S3 bucket. When you will need the file back Amazon will decrypt the key and you will be able to decrypt your file.
Amazon Key Management System (KMS) can generate keys for you and stores your keys for a later reuse. If for any reason you need a physical private module for key storage, you can buy the service Amazon Cloud HSM.
S3 can encrypt your files but you can decide to encrypt by yourself.
You can use encryption also on DB, you can also configure RDS, but it is still your responsibility to grant the access to the right ports for encryption in your Security Groups, and those are usually different from DB to DB.
Another way to give access to services is AWS Security Token Service (STS). Access can be granted by meaning of associating a token to the entity we want to allow access to. This is what in background happens when you assume roles, for example an EC2 instance assuming a role has AWS install a token on the instance (and renew it every hour).
RELIABILITY is achieved by:
- recover from problems (suggestion: horizontally split big resources into smaller ones)
- backup and restore (you can use Amazon Snowball to transfer up to some terabytes of data from your on premise storage to AWS, or Snowmobile to… ahhhhh, look here and you can understand). You can have Cloud replica or low latency solutions as backup stack.
By the way there is also a service on AWS to have your laptop with your choice of OS working on the cloud.
PERFORMANCE: AWS has democratized advanced technologies. You can also have a scalable data warehouse (Redshift).
A note: ALB (Application Load Balancer) differs from ELB just for the possibility to analyse the URL querystring for deciding which machine to invoke.
Trusted Advisor is another service of AWS that checks the 4 pillars of the Well Architected Framework for you. It is a tool that is more detailed for business accounts. Please remember that the best interface between internet and your computational power is to set CloudFront at the entrance of your VPC, to make it point to a Load Balancer and only this can address your servers.
For costs you can use online resources like S3 Simple Calculator.
To achieve a Large Scale Design Pattern you must follow these rules:
- use floating IP address from one instance to another when needed
- share the state of your application in a DB (or in ElastiCache)
- use scheduled scale-out when you know that your service have peaks
- eventually use Job Observed Pattern (meaning to scale your instances based on how full are your queues)
- bootstrap instance
Last service worth to mention is Kinesis. Kinesis is a streaming service very similar to a queue. The main difference is that its messages are lost after a certain amount of time (usually 24 hours, but this could be set also in SQS) and each message is not removed after the first consumer reads it, so every new consumer can start from the beginning of the queue.
If you need to make a visual representation of your architecture on AWS to speak among architect, you may want to use an external service like CloudCraft. And this basically ended the course. I had anyway time, after the final LAB, to ask a couple of comparisons to the teacher, other than the one between Kinesis and SQS.
Comparison between Route 53 and Load Balancers: ELB is invoked for every single request and routes any single packet while Route 53 is invoked only at the beginning of a session to resolve to which IP a domain name must be contacted, and the following requests are directly sent to the IP.
Comparison between Nat Box (Nat Gateway) and Internet Gateway: Nat boxes are used only from your net to reach the internet, they don’t allow incoming traffic to be established and initiated from outside. Nat boxes are then useful to let your instances contact internet (for downloading OS updates for example, or even better to access AWS services that are available only through the internet). Internet Gateways are exposing all a (public) subnet to the internet both for incoming and outgoing traffic. Anyway it is a best practice to expose only ELBs also because they are managed and they have powerful capabilities like SSH Handshake.
The teacher remembered, at the end, the service CodeStar that take care of the development pipeline.
That was all. Stay tuned!