Hi all,
I am attending this course in these days, “Architecting on AWS“, so I will discuss about it and I will keep track of what I learned with as much details as I can remember. It is the first of three days, and right now I am really happy of the contents of this course.

The morning started with the definition of Cloud Computing, with a similarity with Starbucks that sells you coffee on demand, in similar way AWS sells you resources on demand over internet with pay-as-you-go contracts, meaning that you pay a resource only as much as you use it. The concepts related are those of Infrastructure As A Service (IAAS), so the smallest building block of an infrastructure (computing services, networks, subnets…) sold as a Service on internet, Platform As A Service (PAAS), a bit higher level components like Kinesis or queues, and Software As A Service (SAAS).
AWS has basically two Cloud Deployment models: All-In meaning that everything is deployed on AWS, or Hybrid meaning that you can have some of your architecture on AWS and some on prem.

AWS is composed of over 50 services in a secure cloud service platform, 7 or 8 of them are just related to security. Using this cloud solution brings 6 big advantages:

  1. investing on demand and not for long living architecture (flexible expenses against capital investment, meaning that you don’t need to think about buying powerful machine of scaling your needs, but you can pay a small amount today and scale tomorrow)
  2. massive economies of scale
  3. eliminate guessing: you can increase or decrease power only when needed
  4. increase speed and agility (you can easily setup containers, so you can try new possibilities and destroy them, moreover AWS comes with lot of technologies available to everyone)
  5. stop spending money
  6. go global in minutes

After this first introduction we started speaking about Regions, basically where to deploy or store our informations. In AWS each Region can have a different subset of services available, different costs, different compliencies… Regions are made by isolated Availability Zones (AZ), an availability zone is basically a data center, things affecting a specific availability zone are not (by default) replicated into other availability zones, so if you deploy something in a specific zone and that zone suffers a major problem, you can risk to lose your data, if you don’t replicate it. Some services are running on a specific zone (containers, volumes…), some others are related to a region (queues, databases…). Moreover deployment on multiple AZs can avoid downtime and is not affected by network latencies, subnets can see other subnets in different AZs as local. There are also Edge Locations that work like a cache for accessing specific regions.

AWS handles two types of services: Managed Services, the ones that are under AWS control (for deployment, security, update, availability…), and Unmanaged Services, the ones you decide to run on your instances. Let’s say, you need a database? You can install it on one of your machines, but if there is a failure you have to investigate by yourself, or you can buy it as a service from Amazon. This brings us to the definition of responsability: Amazon takes care of the infrastructure and managed services, but it is up to the user to take care of what is happening with their software, both from the availability and the security point of view.

For using at the top AWS, you have then to follow some best practices:

  • make sure that your architecture can handle architectural changes
  • remove manual processes of deployment and replace them with dynamic deployment to support multiple regions/az
  • think of servers as temporary resources
  • reduce the coupling of components and their interdependencies so that failures hit just one component/instance
  • managed services and serverless architecture can provide reliability
  • avoid single points of failure
  • optimize for costs by sizing resources appropriately and by checking pricing choices
  • use caching (CloudFront)
  • secure the infrastructure

After this part we started speaking about core services of AWS:
Amazon VPC is the virtual private cloud. Each VPC is limited to a Region and created on a AZ but it can be configured to see other VPC in other AZs as local. You can setup multiple VPCs on the same account.
Amazon EC2 is basically a virtual machine (Amazon has different Amazon Machine Images (AMI) with different characteristics, included the operating system). You define the computational power and the memory (but you still need a volume – a disk from which the machine is reading the operating system). There are 3 methods for paying for EC2 instances: on demand, meaning that you use machines when you need them, with reserved instances, meaning you reserved a number of machines even when you don’t need them, or spot instances, meaning that you bid on the unused resources of Amazon, and Amazon gives a market value on that resources. The winning bidder pays the resources for one slot of one hour whenever they are available, and it pays only the market price UP TO the value of its bid. The problem of this payment method is that the running is not guaranteed, meaning that you can have one machine assigned today, but just for 20 minutes (less than one hour you are not billed) and that you can have machines killed in every moment.

Amazon EBS is a drive/hard disk attached to an EC2 machine. EBS and EC2 are resources related to a specific AZ, so the space in EBS should not be used for sensible data, cause replication is not performed outside the AZ. Every EC2 instance can have up to 16 volumes attached but it is not possible to attached the same volume to 2 different EC2.
Amazon S3 is an object store. Basically it is a cloud storage working over HTTP (this makes it possible to be used as a web server for a static file. It is a regional service, meaning that it is not related to any AZ. Objects are stored in different buckets whose name is unique globally (and not per region, even if access is by default based on region and bucket name). Its objects cannot be modified, just uploaded and deleted. A bucket can have a lifecycle, meaning that you can set a bucket to copy its content in another bucket after some days from the creation of its objects (maybe into a bucket with an infrequent access pricing that makes it cheaper to store data, but it makes more expensive to download it, or into Amazon Glacier).
Amazon Glacier is like S3 but it works as an archive, meaning that if you want to retrieve something you could have to wait also 3 or 4 hours to get it available for download.

This ended the morning of training. In the afternoon we continued speaking about AWS services.
Amazon RDS is basically an EC2 setup with any flavour of DB managed by Amazon.
Amazon DynamoDB is a NoSQL database that is basically a key-value store, where values have limited size (you cannot put in a column an image, for example), no relational properties make it difficult to be used for complex joins. Its best skills are security, scalability and availability. It cames with the capability of reading in a consistent (more expensive) or inconsistent (cheaper) way.
Amazon Identity & Access Management (IAM) is a crucial service that let you define users and roles. Those define which actions an entity can perform on another entity. Policies take the format of WHO is allow/denied to perform a specific action on a specific resource. A policy can be attached to users, roles, service, instance or whatever. Permissions are denied by default of explicitely, and explicit deny has priority respect an explicit permission, meaning that two or more roles or policies applied to an entity can explicitely define a permission for an action on a resource, but if just one of them is a deny, that entity won’t be able to perform that action on that resource at all.

To sign a specific request (so, to make a specific request related to a specific user) I can use different approach: I can use user and password of the admin account, IAM user and password, public access key and secret key (AKSKs), hardwar controls like a token generator or key pairs (the last one only for OS protection on EC2).
Roles are really useful: let’s say that we want our EC2 instance to access its volume, it couldn’t do that without a speccific access policy, because every request to every service need access rights. We need to give a role useful for this purpose to our EC2 instance to avoid sharing AKSK to every request.
Amazon CloudTrail logs every API calls to every services, console calls included. It stores it into log files… that actually consume space on our disks, so you have to manage them in terms of EBS or S3 or whatever…

After that we spent one hour and an half doing a nice exercise in which we were requested to setup a VPC, and EC2 instance with a Subnet and routes in order to run a web service with access from internet. IAM management was involved, and after that we had a nice talk about different topics. Our teacher was suggesting that Multiple Accounts pattern is better than Multiple users one, meaning that if you have to separate different environments for your application, it is better to create different accounts with a complete stack in all of them instead of using different users. We discussed also about different kinds of architecture: the one with an internet gateway that is exposing a subnet making it a public subnet, and the use of a load balancer instead of directly exposing a web server on the internet. In that case is the load balancer that needs the public IP address.

Last but not least, a consideration about VPCs. Services like DynamoDB or SQS are not included into our custom VPC, they don’t require a specific IP address, but being external they need that the traffic, from a subnet, passes through an internet gateway, goes to the internet and reaches the service. The only service that makes an exception is S3. For S3 you can define a S3 endpoint in a private subnet that speaks with the S3 service.
Subnets are grouped by masks that generate (sub)subnets. For example, 10.0.0.* (or or with mask is a subsubnet but 10.0.1.* is another subnet. Each subnet has at least one super-rule that defines the local subnet, but this means that you can consider a group of subnets to be the same subnet if their super-rule is routing the trafic of each subnet in the same way. For example, if the super-rule defines that is local anything inside 10.0.*.*, the two subnets 10.0.1.* and 10.0.2.* will consider each other as local. This explains why different subnets have no network latency costs (as we said before).

That closed the first day of training. Tomorrow we will start by security groups.
Stay tuned!