Skip to content
IT
GithubLinkedInMastodon

Running scheduled tasks in AWS

AWS, AWS ECS, tech7 min read

AWS ECS scheduled tasks

How many times you have been tasked with building solutions that perform the same job regularly? I guess quite a few times. What do you do with an environment that is not being used between each job run? Let it just sit there till the next run? If not, then how to scale it down automatically? If you have the same questions then this post is for you.

I suggest we first look at the most common approaches for such kind of tasks:

  • Running an EC2 instance, perhaps even a scheduled one, so you can save costs because it's not running all the time. But how can you guarantee that the task is completed and an instance can be scaled down? Now that I think about it you can probably send a message or write to the DB/SSM parameter that the task is completed and regularly check it and act accordingly. However, it already feels like reinventing a wheel when there is a better tool for the job.

  • Running Lambda. One problem here - it has a time limit. If your tasks take longer than this limit then they won't be finished properly. Again, this limitation can be bypassed, for example, we can "remember" where exactly Lambda has progressed to and start it again from that place. However. this is not what Lambdas are designed for.

Here we are. Our scheduled tasks are waiting to be run, and we don't know how, or do we? I'm pretty sure that there are more good options for achieving the same goal, but today we are going to play with AWS Elastic Container Service.

First of all, once you decide to go with ECS, think about what base image you would like to use, which should be determined by the task that you want to run. Is it some Java, Python, PHP application, or something else? Practically it is better to go with the base image that would require the smallest effort for adjustments. Should I point out that ideally, it needs to be supported by a reputable maintainer? There is a third option - build your own.

Once you've decided on the image we'd need to push it to AWS Container Registry, so our tasks can use this image when the environments are being scaled up.

AWS Container Registry and Docker:

  1. Inside your future image folder build it with: docker build -t {organization}/{project-name}:{tag} .

  2. Then tag it like: docker tag {organization}/{project-name}:{tag} {aws-account-number}.dkr.ecr.{aws-region}.amazonaws.com/{project-name}:{tag}

  3. Push it to ECR: docker push {aws-account-number}.dkr.ecr.{aws-region}.amazonaws.com/{project-name}:{tag}

If you are unable to push it then it most probably means that you are not logged in Docker CLI. Do it with: aws ecr get-login-password --region {aws-region} | docker login --username AWS --password-stdin {aws-account-number}.dkr.ecr.{aws-region}.amazonaws.com

You can use AWS Vault and pass to the command above different AWS profiles, which is useful when you are working with multiple accounts. Like: aws-vault exec {aws-profile-name} -- aws ecr get-login-password --region {aws-region} | docker login --username AWS --password-stdin {aws-account-number}.dkr.ecr.{aws-region}.amazonaws.com

Now, once our image is in ECR, let's build the infrastructure that will be using it.

AWS CLoudFormation for ECS

Let's describe the most important bits of the template written in yaml. First of all, we need to define parameters that will be passed to the template when it's being pushed to CloudFormation API. These parameters are used to define the environment in which the tasks will be run. For example, we can set how much CPU and memory will be available to the container, in which VPC it will be run, etc.

Parameters:
# Later in the template I'm using references to different parameters,
# like Account or Vpc, these should be passed to the template
# when pushed to Cloudformation API or you can instead use AWS SSM
# to refer to them dynamically. I'm omitting it in the parameters section
# but don't forget to add them accordingly to your use case.
ContainerCpu:
Type: Number
Default: 1024
Description: How much CPU power is available to the container. 1024 is 1 CPU.
ContainerMemory:
Type: Number
Default: 2048
Description: How much memory in MB is available to the container.

Next, we need to define resources that will be created in AWS. In this case, we need to create a security group, IAM role, log group, ECS cluster, task definition, and scheduled task. Let's start with the security group and IAM role. The security group is used to define which traffic is allowed to the container, and the IAM role is used to define which permissions the container has. In this case, the container will be able to write logs to CloudWatch, run our specific task, and tag resources in ECS. The log group is used to store logs from the container, so we can check them later if needed.

Resources:
SecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: permit VPC connection
VpcId: {your-vpc-id}
Tags:
- Key: some tag
Value: tags improve visibility
ECSTaskExecutionRole:
Type: AWS::IAM::Role
Properties:
RoleName: {role-name}
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service: ecs-tasks.amazonaws.com
Action: sts:AssumeRole
- Effect: Allow
Principal:
Service: events.amazonaws.com
Action: sts:AssumeRole
Path: /
Policies:
- PolicyName: ecs-task-execution-policy
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- logs:CreateLogStream
- logs:PutLogEvents
Resource: !GetAtt DataSyncLogGroup.Arn
- Effect: Allow
Action: ecs:RunTask
Resource: !Sub arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:task-definition/${your-stack-name}-*
- Effect: Allow
Action: iam:PassRole
Resource: !Sub arn:aws:iam::${AWS::AccountId}:role/${your-stack-name}
Condition:
StringLike:
iam:PassedToService: ecs-tasks.amazonaws.com
- Effect: Allow
Action: ecs:TagResource
Resource: "*"
Condition:
StringEquals:
ecs:CreateAction:
- RunTask
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
DataSyncLogGroup:
Type: AWS::Logs::LogGroup
Properties:
LogGroupName: !Sub /ecs/${your-stack-name}/${your-environment}
RetentionInDays: 7

Now, let's create an ECS cluster, task definition, and scheduled task. The cluster is a logical grouping of tasks or services. The task definition is a blueprint for the tasks that will be run in the cluster. The scheduled task is a rule that will trigger the task to run at a specific time.

DataSyncCluster:
Type: AWS::ECS::Cluster
Properties:
ClusterName: !Sub ${your-stack-name}/${your-environment}
ClusterSettings:
- Name: containerInsights
Value: enabled
DataSyncTaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
Family: !Sub ${your-stack-name}
NetworkMode: awsvpc
RequiresCompatibilities:
- FARGATE
Cpu: !Ref ContainerCpu
Memory: !Ref ContainerMemory
ExecutionRoleArn: !Ref ECSTaskExecutionRole
ContainerDefinitions:
- Name: !Sub ${your-stack-name}
Cpu: !Ref ContainerCpu
Memory: !Ref ContainerMemory
Image: !Sub ${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/${project-name}:${tag}
Essential: true
LogConfiguration:
LogDriver: awslogs
Options:
mode: non-blocking
awslogs-group: !Ref DataSyncLogGroup
awslogs-region: !Sub ${AWS::Region}
awslogs-create-group: true
awslogs-stream-prefix: {your-stream-prefix}
HealthCheck:
Command:
- "CMD-SHELL"
- "exit 0"
Interval: 30
Timeout: 5
Retries: 3
StartPeriod: 60
Environment:
- Name: some
Value: variable
Command:
- "--do"
- "something"
DataSyncScheduledTask:
Type: AWS::Events::Rule
Properties:
Name: !Sub ${your-stack-name}-${your-environment}
Description: Scheduled task
ScheduleExpression: rate(20 minutes)
State: ENABLED
Targets:
- Arn: !GetAtt DataSyncCluster.Arn
Id: DataSyncScheduledTask
EcsParameters:
TaskDefinitionArn: !Ref DataSyncTaskDefinition
TaskCount: 1
LaunchType: FARGATE
NetworkConfiguration:
AwsVpcConfiguration:
AssignPublicIp: DISABLED
Subnets:
- !Sub ${your-subnet}
SecurityGroups:
- !Ref SecurityGroup
RoleArn: !GetAtt ECSTaskExecutionRole.Arn

As you can see, we are using FARGATE launch type, which means that the tasks will be run in a serverless environment. We are passing CPU and memory parameters to the task definition, so we can adjust them when needed. The scheduled task is using the cluster, task definition, and execution role defined earlier. As it's running in a VPC - you need to pass the subnet and security group to the task configuration. The health check is a way to check if the task is running properly. In this case, we are just checking if the task is up, and if it's not - it will be restarted automatically. The image is the one we pushed to ECR earlier.

Environment variables can be passed to the task, so it can use them. The command is the one that will be run when the task is started. I recommend reading through about command and entrypoint in Docker documentation, as it can be a bit confusing at first. AssignPublicIp is set to DISABLED, as in this case, we don't need it, because it is an example task and it doesn't need to be available from the outside world. Lastly, the task is running every 20 minutes, but it can be adjusted to run at any time you need.

Conclusion:

That's it. Now you can push this template to CloudFormation API and it will create all the resources needed to run your tasks. Feel free to add more tasks to the template, so you can run multiple tasks in the same environment. This way you can run scheduled tasks in AWS in a cost-effective way, without the need to worry about scaling environments up or down. The tasks will be run in a serverless environment, meaning you can focus on the tasks themselves. I hope this post was helpful, and you can use it to run your tasks in AWS.

Thank you for reading, and have a great day!

© 2024 by Igor Tereshchenko. All rights reserved.