Cluster - Group of container instances that act as a single computing resource.

Container Instance - It is an Amazon EC2 instance that has been registered to be a part of a specific cluster.

Container Agent - It is an open source tool that allows an Amazon EC2 instance to register with a cluster.

Task Definitions - It defines how your application's Docker images should be ran. It is a JSON file.

Scheduler - It determines where a service or one of tasks will run under cluster by figuring out the most optimal instance for it to run on.

Services - It is a long running task, such as a web application. It is based off a task definition. We can specify how many instances of a service can run, and Amazon ECS will ensure that it will run those many instances.

Task - It is an end result of running a task definition.

Amazon ECR - It is a fully managed docker registry. It is a private repository.

Cluster

Group of container instances that act as a single computing resource.
A task gets scheduled to run on cluster.
Can mix and match EC2 instance types in cluster and they can span across multiple AZs.
However, 1 specific EC2 instance can belong to only 1 cluster at a time. 1 EC2 instance cannot join multiple clusters.
Use cases for multiple clusters:
- 1 cluster per environment. E.g. Dev, QA, Prod etc
- 1 cluster per client and each cluster in its separate VPC.
If you use the Fargate launch type with tasks within your cluster, Amazon ECS manages your cluster resources.
If you use the EC2 launch type, then your clusters will be a group of container instances you manage.
Amazon ECS downloads your container images from a registry that you specify, and runs those images within your cluster.

Create a cluster

aws ecs create-cluster --cluster-name deep-dive-cluster

{
    "cluster": {
        "status": "ACTIVE",
        "clusterName": "deep-dive-cluster",
        "registeredContainerInstancesCount": 0,
        "pendingTasksCount": 0,
        "runningTasksCount": 0,
        "activeServicesCount": 0,
        "clusterArn": "arn:aws:ecs:us-east-1:560386595561:cluster/deep-dive-cluster"
    }
}

The ACTIVE status means that the cluster is ready to have container instances join it.

List and describe clusters

aws ecs list-clusters

aws ecs describe-clusters --cluster deep-dive-cluster

Container Agent

Allows container instances to join a cluster. A container instance is nothing but an EC2 instance.
The container agent tool is open source and is available on GitHub.
The container agent runs on each infrastructure resource within an Amazon ECS cluster. It sends information about the resource's current running tasks and resource utilization to Amazon ECS, and starts and stops tasks whenever it receives a request from Amazon ECS. Thus, container agent helps in status reporting.
The Amazon ECS container agent is included in the Amazon ECS-optimized AMI, but you can also install it on any EC2 instance that supports the Amazon ECS specification. The Amazon ECS container agent is only supported on EC2 instances.
The Amazon ECS Container Agent may also be run in a Docker container on an EC2 instance with a recent Docker version installed.
The container agent can be configured and the configuration file (ecs.config) for container agent can be placed in S3 bucket. When an EC2 instance spins up, it can receive the configurations for its container agent from this S3 bucker via startup script.

#!/bin/bash

yum install -y aws-cli
aws s3 cp s3://mojoknight-aws-test/ecs.config /etc/ecs/ecs.config

Before the Amazon ECS agent can register a container instance into a cluster, the agent must know which account credentials to use. You can create an IAM role that allows the agent to know which account it should register the container instance with. When you launch an instance with the Amazon ECS-optimized AMI provided by Amazon using this role, the agent automatically registers the container instance into your default cluster.
The Amazon ECS container agent also makes calls to the Amazon EC2 and Elastic Load Balancing APIs on your behalf, so container instances can be registered and deregistered with load balancers. Before you can attach a load balancer to an Amazon ECS service, you must create an IAM role for your services to use before you start them. This requirement applies to any Amazon ECS service that you plan to use with a load balancer.

Typically, following Roles with Policies are required for container agent. The role must be provided when the EC2 instance is about to start:

ECS-Instance-Role
(Sample Role Name) with below acces:

S3ReadOnlyAccess (To read configuration file from S3)

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": "*"
        }
    ]
}

AmazonEC2ContainerServiceforEC2Role

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecs:CreateCluster",
                "ecs:DeregisterContainerInstance",
                "ecs:DiscoverPollEndpoint",
                "ecs:Poll",
                "ecs:RegisterContainerInstance",
                "ecs:StartTelemetrySession",
                "ecs:UpdateContainerInstancesState",
                "ecs:Submit*",
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        }
    ]
}

Container Instance

A container instance is an EC2 instance that is registered with a cluster. It connects to a cluster using a container agent.
Because the Amazon ECS container agent makes calls to Amazon ECS on your behalf, you must launch container instances with an IAM role that authenticates to your account and provides the required resource permissions.
Container Lifecycles:
- ACTIVE and connected - This happens when the container agent registers the container instance into the cluster while its connection status is true. Once this happens, the container agent is ready to run the tasks.
- ACTIVE and disconnected - This happens when connection is status is false. This happens when the container instance is stopped, in which case, the current tasks that were running, will stop too.
- INACTIVE - This happens when the container instance is terminated and de-registered from a cluster. Such type of instance will not be seen as a part of cluster.
To launch an EC2 container instance, it is recommended to use Amazon ECS Optimized AMI.
Although you can create your own container instance AMI that meets the basic specifications outlined in Container Instance AMIs, the Amazon ECS-optimized AMI is preconfigured and tested on Amazon ECS by AWS engineers. It is the simplest AMI for you to get started and to get your containers running on AWS quickly.

The current Amazon ECS-optimized AMI (amzn-ami-2017.09.g-amazon-ecs-optimized) consists of:
- The latest minimal version of the Amazon Linux AMI
- The latest version of the Amazon ECS container agent (1.16.2)
- The recommended version of Docker for the latest Amazon ECS container agent (17.09.1-ce)
- The latest version of theecs-initpackage to run and monitor the Amazon ECS agent (1.16.2-1)
Creating EC2 Instance with Amazon ECS-Optimized AMI: (ami-28456852 in below command is the AMI ID of Amazon ECS Optimized AMI in us-east-1 region)

$ aws ec2 run-instances --image-id ami-28456852 --count 1 --instance-type t2.micro 
                      --iam-instance-profile Name=mojoknight-ECS-Instance-Role --key-name EC2-Key-Pair 
                      --security-group-ids sg-4aa2153d --user-data file://copy-ecs-to-s3

How to check the docker version in newly created EC2 instance:

$ docker version

How to check the EC2 Container Agent in new created EC2 instance:

 curl -s 127.0.0.1:51678/v1/metadata | python -mjson.tool

 {
    "Cluster": "deep-dive-cluster",
    "ContainerInstanceArn": "arn:aws:ecs:us-east-1:560386595561:container-instance/381b5a35-652c-4b3a-8b15-e6dfadf490f8",
    "Version": "Amazon ECS Agent - v1.16.2 (998c9b5)"
}

The above command will output the Cluster Name and Cluster ARN to which the current EC2 instance is member of. It also prints the container agent version.

Note - The EC2 instance running container agent will automatically connect to ECS cluster while bootstrapping. The ECS cluster name is picked up from the ecs.config file, which it reads from S3 bucket.Note - If an EC2 instance is stopped AFTER it has been registered with cluster, its status will still be shown as ACTIVE in ECS console.

Task Definitions

Describes how your application's docker images should be ran.
To prepare your application to run on Amazon ECS, you create a task definition.
The task definition is a text file, in JSON format, that describes one or more containers, up to a maximum of ten, that form your application. It can be thought of as a blueprint for your application.
Task definitions specify various parameters for your application. Examples of task definition parameters are:
- Which containers to use and the repositories in which they are located
- Which ports should be opened on the container instance for your application
- What data volumes should be used with the containers in the task.
So, an application can have 1 or multiple task definitions depending on size and scope of application.
Containers that are grouped together in same task definition will be scheduled to be ran on the same container instance. E.g. There can be a container for memcached and another container for tomcat that runs the actual application. These 2 containers can be grouped together in same task definition to run on same container instance.
A service is created through a task definition.
There are 3 components of a Task Definition:
- Family - This is the name of task definition and has no correlation to the docker image. When a task definition is registered to a family, it will get a revision number of 1. Each time a new version of task definition is pushed, its revision number is incremented.
- Container Definitions - It includes configuration details like CPU, memory, port mappings, mount points etc.
- Volumes - It is a way to share data between containers or persist data of a container even after a container is stopped.
A bare minimum task definition:

{
  "containerDefinitions": [
    {
      "name": "tomcat",
      "image": "tomcat",
      "portMappings": [
        {
          "containerPort": 8080,
          "hostPort": 8080
        }
      ],
      "memory": 50,
      "cpu": 102
    }
  ],
  "family": "web"
}

$ aws ecs register-task-definition --cli-input-json file://web-task-definition.json

{
    "taskDefinition": {
        "status": "ACTIVE",
        "family": "web",
        "placementConstraints": [],
        "volumes": [],
        "taskDefinitionArn": "arn:aws:ecs:us-east-1:560386595561:task-definition/web:1",
        "containerDefinitions": [
            {
                "environment": [],
                "name": "tomcat",
                "mountPoints": [],
                "image": "tomcat",
                "cpu": 102,
                "portMappings": [
                    {
                        "protocol": "tcp",
                        "containerPort": 8080,
                        "hostPort": 8080
                    }
                ],
                "memory": 50,
                "essential": true,
                "volumesFrom": []
            }
        ],
        "revision": 1
    }
}

"revision" attribute indicates the version number of the task with "family: web"

If we run the above command again, the revision will be incremented by 1.

List / Describe Commands

List Families: $ aws ecs list-task-definition-families

List Task Definitions: $ aws ecs list-task-definitions

Describe Task Definition (by providing family and version): $

Running the "register-task-definition" command again will bump the version number by 1.

$ aws ecs register-task-definition --cli-input-json file://web-task-definition.json

{
    "taskDefinition": {
        "status": "ACTIVE",
        "family": "web",
        "placementConstraints": [],
        "volumes": [],
        "taskDefinitionArn": "arn:aws:ecs:us-east-1:560386595561:task-definition/web:2",
        "containerDefinitions": [
            {
                "environment": [],
                "name": "tomcat",
                "mountPoints": [],
                "image": "tomcat",
                "cpu": 102,
                "portMappings": [
                    {
                        "protocol": "tcp",
                        "containerPort": 8080,
                        "hostPort": 8080
                    }
                ],
                "memory": 50,
                "essential": true,
                "volumesFrom": []
            }
        ],
        "revision": 2
    }
}

Deregistering a task - We need to provide family name and version number.

$ aws ecs deregister-task-definition --task-definition web:2

Scheduler

Scheduler makes sure you utilize your cluster's computing resources in the best way possible with least amount of efforts on your part.
We don't need to worry about which specific instance is running a specific service or task. If that is the case, then we have the flexibility of choosing the third party scheduler like Mesos. So, scheduler is flexible.

Scheduling Services

Services are long lived and stateless process.
E.g. webapp. It should contain no state and is always expected to be running.
When defining a service, we can define how many instances of service you want. We can tell this number in task definition JSON. If scheduler detects that less number of instances are running because one of them has gone down, then it will start a new copy of service automatically.
We can also hook up service to an ELB.
3 Steps for placing a service into cluster:
- Compare the task definition's attributes to the state of the cluster - The scheduler takes a look at task definition's CPU, memory, ports and other parameters. Then, it will take a look at state of your cluster and finally take a look at container instances that are capable of running it.
- Check how many service instances are running in an AZ - At this point, the scheduler has a list of container instances and it is going to take a look at how many instances of a service are running in an AZ. Now, the scheduler is going to put service instance into an AZ that has least amount of tasks for the service that it is trying to schedule.
- Check how many service instances are running on container instance - The scheduler is going to look at the container instances that has least amount of service instances running on it. If there are 3 service instances to be run and scheduler has a list of 3 container instances, then it will run 1 instance on each container instance, even though a single container instance is capable of running all the 3 service instances.

Scheduling Tasks

The concept behind these types of tasks is - they run for certain amount of time, and then exits when done. E.g. Encode a video, run a DB migration.
This is contrast to a webapp, which is a long running.
RunTask randomly distributes tasks on your cluster, but minimizes specific instances from getting overloaded.

Scheduler and Starting Tasks

Running a task and starting a task are different stuff.
StartTask lets you pick where you want a task to run. E.g. CPU intensive tasks should go to high CPU container instance.
It also allows us to build our own custom scheduler.

Lifecycle of Service and Tasks

PENDING
RUNNING
STOPPED

The container agent is responsible for state tracking.

Creating a Service. We need to provide:
- Cluster Name
- Task definition family name
- Desired name of service
- Desired instance count for service

$ aws ecs create-service --cluster deep-dive-cluster --service-name web --task-definition web --desired-count 1

{
    "service": {
        "status": "ACTIVE",
        "taskDefinition": "arn:aws:ecs:us-east-1:560386595561:task-definition/web:1",
        "pendingCount": 0,
        "loadBalancers": [],
        "placementConstraints": [],
        "createdAt": 1517730923.303,
        "desiredCount": 1,
        "serviceName": "web",
        "clusterArn": "arn:aws:ecs:us-east-1:560386595561:cluster/deep-dive-cluster",
        "serviceArn": "arn:aws:ecs:us-east-1:560386595561:service/web",
        "deploymentConfiguration": {
            "maximumPercent": 200,
            "minimumHealthyPercent": 100
        },
        "deployments": [
            {
                "status": "PRIMARY",
                "pendingCount": 0,
                "createdAt": 1517730923.303,
                "desiredCount": 1,
                "taskDefinition": "arn:aws:ecs:us-east-1:560386595561:task-definition/web:1",
                "updatedAt": 1517730923.303,
                "id": "ecs-svc/9223370519123852504",
                "runningCount": 0
            }
        ],
        "events": [],
        "runningCount": 0,
        "placementStrategy": []
    }
}

List services on cluster

$ aws ecs list-services --cluster deep-dive-cluster

Describe a service on a cluster

$ aws ecs describe-services --cluster deep-dive-cluster --services web

How ECS works behind the scenes:

We upload a Task Definition to ECS. A task definition has a family name and a version. It also contains the name of the docker image to pull from docker repository, maximum memory and CPU this docker should be allowed to run.
We then create a service using a Task Definition. While creating a service, we have to specify the Task Definition family name and version number. We also need to specify the desired instance count for this service.
The "Services" tab shows the status of currently running services
The "Tasks" tab shows the tasks spawned by ECS for a specific service.
Once a service is created, ECS will look at number of instances required for this service. If there are 2 instances of this service are required, then:
1. ECS will select 2 container instances and 1 instance of the service will be run on each of the 2 instances
2. On each container instance, ECS will download the docker image from docker repository.
To delete a service, we first have to update the service to a "Desired Count = 0". This will cause ECS to not to run any instances of this service. This is beacuse even if 1 of the instance of a service is running, then the service cannot be deleted.
Once the "Desired Count" of a service is set to 0, there will be no tasks spawned by ECS for this service, and then we can delete the service.

Notes from Experimenting:

I created a Task Definition for a Tomcat docker image with maximum memory set to 50 MB and CPU to 102.
I created a service using this task definition with a desired count of 1.
When the service was created, ECS spawned 1 task for it and it was assigned to an EC2 instance. Interesting part - Since tomcat always required more than 50 MB to run, this was causing the service instance to be shut down by ECS. But, since desired count was set to 1, ECS was immediately trying to schedule another instance of this service. Hence, there were many failures in the events.
I then updated the task definition by increasing memory and CPU values. This created a new version of task definition, and then I updated the service to refer to this new version of task definition.
Once the service instance is started successfully within an EC2 instance, then we can access the service via EC2 instance's public IP. E.g. In this case, the 'web' service's instance was launched successfully in an EC2 instance. This means that the EC2 instance has tomcat docker running in it, and I was able to access tomcat via EC2 instance's public IP.
If we have SSH Key pairs of the EC2 instances which are registered with the cluster, we can actually login to the EC2 instance and check docker images, running docker containers etc.

ECS