Skip to content

sqlxpert/step-stay-stopped-aws-rds-aurora

Repository files navigation

Step-Stay Stopped, RDS and Aurora!

Reliably keep AWS databases stopped when not needed

Purpose

AWS automatically starts RDS and Aurora databases after they've been stopped for 7 days. This Step Function re-stops them automatically. It uses the same reliable process as my original, Lambda-based solution, whereas most alternatives have race conditions that can leave databases running with no warning.

You do not have to opt-out or opt-in by tagging databases. Running databases keep running. Only databases stopped for 7 days trigger this tool, via RDS-EVENT-0154 (RDS database instance) or RDS-EVENT-0153 (Aurora database cluster).

🔒 Software supply chain security is on everyone's mind. This tool contains no traditional executable code and has no dependencies. You can read the Assign , Output , Resource , and Arguments  lines (< 25 lines) in the Step Function definition to check how events generated by AWS are transformed into AWS API calls. The Step Function role and the error queue policy are least-privilege. Extensive sections on security and least-privilege installation have always been part of this ReadMe. I've made GitHub releases immutable as of v2.4.0 .

Jump to: Get StartedMulti-Account, Multi-RegionSecurity

Use Cases

  • development/test database not used on a regular daily basis
  • developer vacation or leave of absence beyond 1 week
  • old database kept just in case

Start the database when you need it, then stop it when you are finished. Step-Stay-Stopped lets you leave it stopped indefinitely.

If you also install Lights Off, just start the database when you need it. Lights Off will stop it at the end of your work day, and Step-Stay-Stopped will keep it stopped. The recommended database tag for use with Lights Off is sched-stop : d=_ H:M=03:30 . This example is for the USA West Coast: 03:30 UTC = 19:30 Pacific Standard Time = 20:30 Pacific Daylight Time (converter).

Cost Savings

AWS does not charge for database instance hours while an RDS database instance is stopped or an Aurora database cluster is stopped. Other charges, such as for storage and snapshots, continue.

Step-Stay-Stopped resolves two Cloud Efficiency Hub reports:

Diagram

Click to view the simplified flowchart:

After waiting 9 minutes, call to stop the Relational Database Service or Aurora database. Case 1: If the stop request succeeds, retry. Case 2: If the Aurora cluster is in an invalid state, parse the error message to get the status. Case 3: If the RDS instance is in an invalid state, get the status by calling to describe the RDS instance. Exit if the database status from Case 2 or 3 is 'stopped' or another final status. Otherwise, retry every 9 minutes, for 24 hours.

Get Started

  1. Log in to the AWS Console as an administrator. Choose an AWS account and a region where you have an RDS or Aurora database that is normally stopped, or a database that you won't need for 8 days (stop the database now).

  2. If you used Stay-Stopped, the original, AWS Lambda-based tool, delete any StayStoppedRdsAurora CloudFormation stacks, or delete the StayStoppedRdsAurora CloudFormation StackSet.

  3. Install Step-Stay-Stopped using CloudFormation or Terraform.

  4. Wait 8 days, then check that your RDS or Aurora database is still stopped. After clicking the RDS database instance name or the Aurora database cluster name, open the "Logs & events" tab and scroll to "Recent events". At the right, click to change "Last 1 day" to "Last 2 weeks". The "System notes" column should include the following entries, listed here from newest to oldest. There might be other entries in between.

    RDS Aurora
    DB instance stopped DB cluster stopped
    DB instance started DB cluster started
    DB instance is being started due to it exceeding the maximum allowed time being stopped. DB cluster is being started due to it exceeding the maximum allowed time being stopped.

    If you don't want to wait 8 days, see Testing, below.

Multi-Account, Multi-Region

For reliability, Step-Stay-Stopped works independently in each region, in each AWS account. To deploy in multiple regions and/or multiple AWS accounts,

  1. Delete any standalone StepStayStoppedRdsAurora CloudFormation stacks in your target regions and/or AWS accounts (including any instances of the basic //terraform module; you will be installing one instance of the //terraform-multi module).

    • If you used Stay-Stopped, the original, AWS Lambda-based tool, delete any StayStoppedRdsAurora CloudFormation stacks, or delete the StayStoppedRdsAurora CloudFormation StackSet.
  2. Complete the prerequisites for creating a StackSet with service-managed permissions.

  3. Install Step-Stay-Stopped as a CloudFormation StackSet, using CloudFormation or Terraform. You must use your AWS organization's management account, or a delegated administrator AWS account.

    • CloudFormation
      Easy

      Create a CloudFormation StackSet.

      Select "Upload a template file", then select "Choose file" and upload a locally-saved copy of cloudformation/step_stay_stopped_aws_rds_aurora.yaml [right-click to save as...].

      On the next page, set:

      • StackSet name: StepStayStoppedRdsAurora

      On the "Set deployment options" page, under "Accounts", select "Deploy stacks in organizational units". Enter the ou- ID(s). Step-Stay-Stopped will be deployed to all AWS accounts within the organizational unit(s). Next, "Specify Regions".

    • Terraform

      Your module block will now resemble:

      module "stay_stopped_rds_stackset" {
        source = "git::https://github.com/sqlxpert/step-stay-stopped-aws-rds-aurora.git//terraform-multi?ref=v2.4.1"
        # Reference a specific version from github.com/sqlxpert/step-stay-stopped-aws-rds-aurora/releases
        # Check that the release is immutable!
      
        stay_stopped_rds_stackset_regions = ["us-east-1", "us-west-2",]
        stay_stopped_rds_stackset_organizational_unit_ids = [
          "ou-0123-abcdefg",
        ]
      }

      Test mode is always disabled in this configuration. This is a safeguard against unintended use in production.

Installation with Terraform

Get Started Step 3 includes the option to install Step-Stay-Stopped as a Terraform module in one region in one AWS account. This is the basic //terraform module.

The enhanced region support added in v6.0.0 of the Terraform AWS provider makes it possible to deploy resources in multiple regions in one AWS account without configuring a separate provider for each region. Step-Stay-Stopped is compatible because the Terraform module was written for AWS provider v6, the original CloudFormation templates always let CloudFormation assign unique physical names to account-wide, non-regional resources like IAM roles, and the CloudFormation parameters were already region-independent. Your module block will now resemble:

module "stay_stopped_rds" {
  source = "git::https://github.com/sqlxpert/step-stay-stopped-aws-rds-aurora.git//terraform?ref=v2.4.1"
  # Reference a specific version from github.com/sqlxpert/step-stay-stopped-aws-rds-aurora/releases
  # Check that the release is immutable!

  for_each                = toset(["us-east-1", "us-west-2",])
  stay_stopped_rds_region = each.key
}

For installation in multiple AWS accounts (regardless of the number of regions), wrapping a CloudFormation StackSet in HashiCorp Configuration Language remains much easier than configuring Terraform to deploy identical resources in multiple AWS accounts. The Multi-Account, Multi-Region installation instructions include the option to do this using a Terraform module, at Step 3. This is the //terraform-multi module.

Least-Privilege Installation

Least-privilege installation details...

CloudFormation Stack Least-Privilege

You can use a CloudFormation service role to delegate only the privileges needed to create the StepStayStoppedRdsAurora stack. (This is done for you if you use Terraform at Step 3 of Get Started.)

First, create the StepStayStoppedRdsAuroraPrereq stack from cloudformation/step_stay_stopped_aws_rds_aurora_prereq.yaml .

Under "Additional settings" → "Stack policy - optional", you can "Upload a file" and select a locally-saved copy of cloudformation/step_stay_stopped_aws_rds_aurora_prereq_policy.json . The stack policy prevents inadvertent replacement or deletion of the deployment role during stack updates, but it cannot prevent deletion of the entire StepStayStoppedRdsAuroraPrereq stack.

Next, when you create the StepStayStoppedRdsAurora stack from cloudformation/step_stay_stopped_aws_rds_aurora.yaml , set "Permissions - optional" → "IAM role - optional" to StepStayStoppedRdsAuroraPrereq-DeploymentRole . If your own privileges are limited, you might need permission to pass the deployment role to CloudFormation. See the StepStayStoppedRdsAuroraPrereq-SampleDeploymentRolePassRolePol IAM policy for an example.

CloudFormation StackSet Least-Privilege

For a CloudFormation StackSet, you can use self-managed permissions by copying the inline IAM policy of StepStayStoppedRdsAuroraPrereq-DeploymentRole to a customer-managed IAM policy, attaching your policy to AWSCloudFormationStackSetExecutionRole and propagating the policy and the role policy attachment to all target AWS accounts.

Terraform Least-Privilege

If you do not give Terraform full AWS administrative permissions, you must give it permission to:

  • List, describe, get tags for, create, tag, update, untag and delete IAM roles, update the "assume role" (role trust or "resource-based") policy, and put and delete in-line policies

  • Attach managed IAM policies to, and detach them from, roles (if you set AttachLocalPolicy)

  • List, describe, create, tag, update, untag, and delete CloudFormation stacks

  • Set and get CloudFormation stack policies

  • Pass StepStayStoppedRdsAuroraPrereq-DeploymentRole-* to CloudFormation

  • List, describe, and get tags for, all data sources. To see the data sources, run:

    grep 'data "' terraform*/*.tf | cut --delimiter=' ' --fields='1,2' | sort | uniq

Open the AWS Service Authorization Reference, go through the list of services on the left, and consult the "Actions" table for each of:

  • AWS Identity and Access Management (IAM)
  • CloudFormation
  • AWS Security Token Service
  • AWS Key Management Service (if you encrypt the SQS queue or the CloudWatch log group, or Step Function data, with KMS keys)
  • AWS Organizations (if you create a CloudFormation StackSet with the //terraform-multi module)

In most cases, you can scope Terraform's permissions to one workload by regulating resource naming and tagging, and then by using:

Check Service and Resource Control Policies (SCPs and RCPs), as well as resource policies (such as KMS key policies).

The basic //terraform module creates the StepStayStoppedRdsAuroraPrereq stack, which defines the IAM role that gives CloudFormation the permissions it needs to create the StepStayStoppedRdsAurora stack. Terraform itself does not need the deployment role's permissions.

Security

In accordance with the software license, nothing in this document establishes indemnification, a warranty, assumption of liability, etc. Use this software entirely at your own risk. You are encouraged to review the source code.

Security Design Goals

Security goals...
  • A least-privilege role for the AWS Step Function.

  • A Step Function role that cannot be used by arbitrary functions. If the role is passed to an arbitrary Step Function, Task states will not gain access to the Aurora and RDS API.

  • A least-privilege queue policy. The error (dead letter) queue can only consume messages from EventBridge. Encryption in transit is required.

  • Optional encryption at rest with the AWS Key Management System, for the error queue, Step Function state machine payloads, and the log. This can protect EventBridge events containing database identifiers and metadata, such as tags. KMS keys housed in a different AWS account, and multi-region keys, are supported.

  • A retry mechanism and a state machine timeout, to increase the likelihood that a database will be stopped as intended but prevent endless retries.

  • A 24-hour event date/time expiry check, to prevent processing of accumulated stale events, if any.

  • Readable Identity and Access Management policies, formatted as CloudFormation YAML rather than JSON (where possible), and broken down into discrete statements by service, resource or principal.

Your Security Steps

Security actions...
  • Prevent people from modifying components of this tool, most of which can be identified by StepStayStoppedRdsAurora in ARNs and in the automatic aws:cloudformation:stack-name tag.

  • Log infrastructure changes using CloudTrail, and set up alerts.

  • Prevent people from directly invoking the Step Function.

  • Separate production workloads. Although this tool only stops databases that AWS is starting after they've been stopped for 7 days, the Step Function could stop any database if invoked directly, with a contrived event as input. You might choose not to deploy this tool in AWS accounts used for production, or you might add a custom IAM policy to deny the function role permission to stop certain databases. See the AttachLocalPolicy parameter.

    • Tagging an RDS database instance or an Aurora database cluster with StayStopped-Exclude prevents the Step Function role from being misused to stop that database. Requiring an inclusion tag is also possible. See the ExcludeTagKey and IncludeTagKey parameters. ⚠ Do not rely on attribute-based access control unless you also prevent people and systems from adding, changing and deleting ABAC tags. A sample Service Control Policy is available.
  • Enable the test mode only in a non-critical AWS account and region, and turn the test mode off again as quickly as possible.

  • Monitor the error (dead letter) queue, and monitor the log.

  • Configure budget alerts and use cost anomaly detection.

  • Occasionally start a database before its maintenance window and leave it running, to catch up with RDS and Aurora security updates.

  • If you use Terraform, do not use it with an AWS access key and do not give it full AWS administrative privileges. Instead, follow AWS's Best practices for using the Terraform AWS Provider: Security best practices. Do the extra work of defining a least-privilege IAM role for deploying each workload. Configure Terraform to assume workload-specific roles. The CloudFormation service role is one element, but achieving least-privilege also requires limiting Terraform's privileges.

Service Control Policy

Protecting database tags...

Step-Stay-Stopped works without database tags. The exclusion/inclusion tagging options are for security, not for function. If you decide to tag your databases, a sample service control policy is available to prevent tampering.

This SCP offers one-way protection: Non-exempt roles can reduce but not increase the range of databases that the Step Function role is allowed to stop. Specifically, non-exempt roles cannot remove the exclusion tag or add the inclusion tag.

In your AWS Organizations management account, in the region where you manage infrastructure-as-code templates for non-regional resources, create a CloudFormation stack from cloudformation/step_stay_stopped_aws_rds_aurora.yaml .

Or, reference the equivalent Terraform module:

module "stay_stopped_rds_scp" {
  source = "git::https://github.com/sqlxpert/step-stay-stopped-aws-rds-aurora.git//terraform-scp?ref=v2.4.1"
  # Reference a specific version from github.com/sqlxpert/step-stay-stopped-aws-rds-aurora/releases
  # Check that the release is immutable!

  scp_target_ids = [
    "ou-0123-abcdefg",
  ]
}

In either case, specify the number of the account or the ou- ID of the organizational unit that you use for testing SCPs.

Test the SCP before applying it broadly, because it generally reduces existing RDS/Aurora permissions. Human users or automated processes might rely on those permissions. This is especially true of backup restoration, blue/green deployment, and Aurora cluster scaling workflows, which might copy tags to new databases.

You will need at least one SCP-exempt role in every AWS account, to manage the exclusion/inclusion tags. I recommend IAM Identity Center permission sets. You can customize ScpPrincipalCondition / scp_principal_condition to reference permission set roles.

The SCP works by denying certain rds:RemoveTagsFromResource and rds:AddTagsToResource requests. It cannot add permissions that have been denied by another SCP, or that were never allowed by a role's attached or inline policies.

SCPs do not affect roles or other IAM principals in the AWS Organizations management account.

Troubleshooting

Check the:

  1. StepStayStoppedRdsAurora-StepFn CloudWatch log group

    • Rds.InvalidDbInstanceStateException or Rds.InvalidDbClusterStateException errors, with no other proximate errors, are expected and can be ignored.
    • Log entries are JSON objects.
    • For more data, change the LogLevel parameter.
  2. "Executions" data for the StepStayStoppedRdsAurora-StepFn Step Function

    • The "State view" is useful for diagnosing errors.
    • Rows with "Caught error" in the "Status" column are expected and can be ignored if the "Reason" is Rds.InvalidDbInstanceStateException or Rds.InvalidDbClusterStateException .
  3. StepStayStoppedRdsAurora-ErrorQueue (dead letter) SQS queue

    • A message here means that the Step Function did not run; the request to stop the database was not made.
    • Usually the local security configuration is denying EventBridge necessary access to the Step Function.
  4. CloudTrail Event history

    • CloudTrail events with an "Error code" may indicate permissions problems, typically due to the local security configuration.
    • To see more events, change "Read-only" from false to true .

Testing

Testing details...

Recommended Test Database

An RDS database instance ( db.t4g.micro , 20 GiB of gp3 storage, 0 days' worth of automated backups) is cheaper than a typical Aurora cluster, not to mention faster to create, stop, and start.

Test Mode

AWS starts RDS and Aurora databases that have been stopped for 7 days, but we need a faster mechanism for realistic, end-to-end testing. Temporarily change these parameters:

Parameter Normal Test
Test false true
StepFnWaitSeconds 540 60
Equivalent in minutes 9 minutes 1 minute
StepFnTimeoutSeconds 86400 1800
Equivalent in hours 24 hours ½ hour
LogLevel ERROR ALL

⚠ Exit test mode as quickly as possible, given the operational and security risks explained below. If your test database is ready, several minutes should be sufficient. Test mode is always disabled in the //terraform-multi module.

Test by Manually Starting a Database

In test mode, Step-Stay-Stopped also responds to user-initiated, non-forced database starts: RDS-EVENT-0088 (RDS database instance) and RDS-EVENT-0151 (Aurora database cluster). Although it won't stop databases that are already running and remain running, ⚠ while in test mode Step-Stay-Stopped will stop databases that you start manually. To test, manually start a stopped RDS or Aurora database.

In test mode, Step-Stay-Stopped also receives RDS-EVENT-0088 (Aurora database instance). Internally, the Step Function ignores it in favor of the cluster-level event.

Test by Invoking the Step Function

Depending on locally-determined permissions, you may also be able to invoke the StepStayStoppedRdsAurora-StepFn Step Function manually. Edit the database names and date/time strings (must be within the past StepFnTimeoutSeconds and end in Z for UTC) in these test inputs:

{
  "detail": {
    "SourceIdentifier": "Name-Of-Your-RDS-Database-Instance",
    "Date": "2025-06-06T04:30Z",
    "SourceType": "DB_INSTANCE",
    "EventID": "RDS-EVENT-0154"
  },
  "detail-type": "RDS DB Instance Event",
  "source": "aws.rds",
  "version": "0"
}
{
  "detail": {
    "SourceIdentifier": "Name-Of-Your-Aurora-Database-Cluster",
    "Date": "2025-06-06T04:30Z",
    "SourceType": "CLUSTER",
    "EventID": "RDS-EVENT-0153"
  },
  "detail-type": "RDS DB Cluster Event",
  "source": "aws.rds",
  "version": "0"
}

Report Bugs

After following the troubleshooting steps and ruling out local issues such as permissions — especially hidden controls such as Service and Resource control policies (SCPs and RCPs) — please report bugs. Thank you!

Licenses

Scope Link Included Copy
Source code, and source code in documentation GNU General Public License (GPL) 3.0 LICENSE-CODE.md
Documentation, including this ReadMe file GNU Free Documentation License (FDL) 1.3 LICENSE-DOC.md

Copyright Paul Marcelin

Contact: marcelin at cmu.edu (replace "at" with @)