Mastering resources Power Management in AWS using Step Functions
Many companies are providing online services to their customers. This websites, applications and systems are critical assets acting like online storefront, available 24/7.
Digital sales, a new challenge
Like as a merchant carefully put his products in his physical storefront, similar attention to detail, high availability, and a seamless browsing experience are crucial for a website. No one prefers a slow website cluttered with ads. However, managing a website is often far more complex than handling a physical store. Online customers may encounter various bugs, which can negatively impact conversion rates and profits.
This complexity had led companies to increase the number of tests conducted before making any change (visual or functional) available to their customers. Many environments are now provisioned to match this need, offering new opportunities and flexibility during the tests phases: a environment can be dedicated to a team or to test a specific change.
However, running more and more environments is not without effect on the bill. As a result, DevOps teams have begun implementing various cost control mechanisms. A common strategy involves scaling down resources in development or quality acceptance test environments. For example, a production environment with a database instance featuring 1TB of storage and 32GB of memory might be scaled down in a development setting to 250GB of storage and 8GB of memory.
Power management
One effective strategy to manage costs in non-production environments is to turn them off outside of working hours. The ideal solution for this would be easy to install, simple to maintain, and flexible enough to adapt to various constraints.
To better understand how this works in practice, let’s consider a hypothetical scenario with a company named Acme. Acme operates a Drupal website hosted on AWS. Their environment architecture is structured to maximize efficiency and minimize costs.
During typical working hours, Acme’s development and testing environments are fully operational, providing the necessary resources for their team's activities. However, outside of these hours, these environments are not actively used. By implementing a system that automatically turns these environments off during off-hours, Acme could significantly reduce its operational costs. This system needs to be not only user-friendly in terms of installation and maintenance but also versatile enough to accommodate Acme’s specific operational needs and constraints.
A brief examination of this architecture diagram highlight two key components that can be deactivated during non-working hours: the EC2 instance and the MySQL RDS instance. To manage the power of these environments efficiently and cost-effectively, we will use AWS Step Functions.
Let's begin by launching the Step Functions Workflow Studio. Here's how we proceed:
This tool enables users to visually create a Step Function workflow without the need for coding. The workflow can then be exported in either yml
or json
format, allowing it to be seamlessly integrated with Terraform and your SCM solution.
In my workflow, the initial step is to shut down the Drupal server on EC2. This is done to prevent any issue that may happen when the RDS instance is turned off, and extra noise generated by logs:
I've updated the "state name" in the workflow to more accurately reflect its purpose and included the payload for the API parameter fields. In this example, the ID is hardcoded for simplicity. However, for real-world application, I recommend initiating the process with a step that executes an ec2:describeInstances
command. This can efficiently retrieve the instance ID using a suitable parameter, like tag-key
, and then pass this ID to the subsequent step responsible for turning off the instance.
Upon completing the 'Stop Drupal server' step, the EC2 instance should be turned off. This allows us to proceed with shutting down the RDS instance. Based on my experience, turning off an EC2 instance can take a few seconds. Since this workflow is designed to prevent any communication issues between Drupal and its database, I will incorporate an additional safeguard by introducing a delay before initiating the shutdown of the RDS instance:
Using the "Flow" panel, I've introduced a 60-second pause between the shutdown step of the Drupal server and the following step. Now, let's proceed to turn off the database:
The workflow creation is now complete. This demonstrates how Step Functions can be quickly designed and deployed to manage various kinds of resources. To automate this process, an EventBridge schedule can be set up to trigger this Step Function at a specific time, such as 6pm