Reliably keep AWS databases stopped when not needed
AWS automatically starts RDS and Aurora databases after they've been stopped for 7 days. This Step Function re-stops them automatically. It uses the same reliable process as my original, Lambda-based solution, whereas most alternatives have race conditions that can leave databases running with no warning.
You do not have to opt-out or opt-in by tagging databases. Running databases keep running. Only databases stopped for 7 days trigger this tool, via RDS-EVENT-0154 (RDS database instance) or RDS-EVENT-0153 (Aurora database cluster).
🔒 Software supply chain security is on everyone's mind. This tool contains no traditional executable code and has no dependencies. You can read the
Assign,Output,Resource, andArgumentslines (< 25 lines) in the Step Function definition to check how events generated by AWS are transformed into AWS API calls. The Step Function role and the error queue policy are least-privilege. Extensive sections on security and least-privilege installation have always been part of this ReadMe. I've made GitHub releases immutable as ofv2.4.0.
Jump to: Get Started • Multi-Account, Multi-Region • Security
- development/test database not used on a regular daily basis
- developer vacation or leave of absence beyond 1 week
- old database kept just in case
Start the database when you need it, then stop it when you are finished. Step-Stay-Stopped lets you leave it stopped indefinitely.
If you also install
Lights Off,
just start the database when you need it. Lights Off will stop it at the end of
your work day, and Step-Stay-Stopped will keep it stopped. The recommended
database tag for use with Lights Off is
sched-stop : d=_ H:M=03:30 . This example is for the USA
West Coast:
03:30 UTC
= 19:30 Pacific Standard Time
= 20:30 Pacific Daylight Time
(converter).
AWS does not charge for database instance hours while an RDS database instance is stopped or an Aurora database cluster is stopped. Other charges, such as for storage and snapshots, continue.
Step-Stay-Stopped resolves two Cloud Efficiency Hub reports:
- CER-0293: Automatic Restart of Stopped Aurora Clusters Causing Unintended Compute Charges
- CER-0097: No Lifecycle Management for Temporarily Stopped RDS Instances
Click to view the simplified flowchart:
-
Log in to the AWS Console as an administrator. Choose an AWS account and a region where you have an RDS or Aurora database that is normally stopped, or a database that you won't need for 8 days (stop the database now).
-
If you used Stay-Stopped, the original, AWS Lambda-based tool, delete any
StayStoppedRdsAuroraCloudFormation stacks, or delete theStayStoppedRdsAuroraCloudFormation StackSet. -
Install Step-Stay-Stopped using CloudFormation or Terraform.
-
CloudFormation
Easy ✓Create a CloudFormation stack.
Select "Upload a template file", then select "Choose file" and navigate to a locally-saved copy of cloudformation/step_stay_stopped_aws_rds_aurora.yaml [right-click to save as...].
On the next page, set:
- Stack name:
StepStayStoppedRdsAurora
- Stack name:
-
Terraform
Check that you have at least:
Add the following child module to your existing root module:
module "stay_stopped_rds" { source = "git::https://github.com/sqlxpert/step-stay-stopped-aws-rds-aurora.git//terraform?ref=v2.4.1" # Reference a specific version from github.com/sqlxpert/step-stay-stopped-aws-rds-aurora/releases # Check that the release is immutable! }
-
-
Wait 8 days, then check that your RDS or Aurora database is still stopped. After clicking the RDS database instance name or the Aurora database cluster name, open the "Logs & events" tab and scroll to "Recent events". At the right, click to change "Last 1 day" to "Last 2 weeks". The "System notes" column should include the following entries, listed here from newest to oldest. There might be other entries in between.
RDS Aurora DB instance stopped DB cluster stopped DB instance started DB cluster started DB instance is being started due to it exceeding the maximum allowed time being stopped. DB cluster is being started due to it exceeding the maximum allowed time being stopped. If you don't want to wait 8 days, see Testing, below.
For reliability, Step-Stay-Stopped works independently in each region, in each AWS account. To deploy in multiple regions and/or multiple AWS accounts,
-
Delete any standalone
StepStayStoppedRdsAuroraCloudFormation stacks in your target regions and/or AWS accounts (including any instances of the basic//terraformmodule; you will be installing one instance of the//terraform-multimodule).- If you used Stay-Stopped, the original, AWS Lambda-based tool, delete any
StayStoppedRdsAuroraCloudFormation stacks, or delete theStayStoppedRdsAuroraCloudFormation StackSet.
- If you used Stay-Stopped, the original, AWS Lambda-based tool, delete any
-
Complete the prerequisites for creating a StackSet with service-managed permissions.
-
Install Step-Stay-Stopped as a CloudFormation StackSet, using CloudFormation or Terraform. You must use your AWS organization's management account, or a delegated administrator AWS account.
-
CloudFormation
Easy ✓Create a CloudFormation StackSet.
Select "Upload a template file", then select "Choose file" and upload a locally-saved copy of cloudformation/step_stay_stopped_aws_rds_aurora.yaml [right-click to save as...].
On the next page, set:
- StackSet name:
StepStayStoppedRdsAurora
On the "Set deployment options" page, under "Accounts", select "Deploy stacks in organizational units". Enter the
ou-ID(s). Step-Stay-Stopped will be deployed to all AWS accounts within the organizational unit(s). Next, "Specify Regions". - StackSet name:
-
Terraform
Your module block will now resemble:
module "stay_stopped_rds_stackset" { source = "git::https://github.com/sqlxpert/step-stay-stopped-aws-rds-aurora.git//terraform-multi?ref=v2.4.1" # Reference a specific version from github.com/sqlxpert/step-stay-stopped-aws-rds-aurora/releases # Check that the release is immutable! stay_stopped_rds_stackset_regions = ["us-east-1", "us-west-2",] stay_stopped_rds_stackset_organizational_unit_ids = [ "ou-0123-abcdefg", ] }
Test mode is always disabled in this configuration. This is a safeguard against unintended use in production.
-
Get Started
Step 3 includes the option to install Step-Stay-Stopped as a Terraform
module in one region in one AWS account. This is the basic //terraform
module.
The enhanced region support added in v6.0.0 of the Terraform AWS provider makes it possible to deploy resources in multiple regions in one AWS account without configuring a separate provider for each region. Step-Stay-Stopped is compatible because the Terraform module was written for AWS provider v6, the original CloudFormation templates always let CloudFormation assign unique physical names to account-wide, non-regional resources like IAM roles, and the CloudFormation parameters were already region-independent. Your module block will now resemble:
module "stay_stopped_rds" {
source = "git::https://github.com/sqlxpert/step-stay-stopped-aws-rds-aurora.git//terraform?ref=v2.4.1"
# Reference a specific version from github.com/sqlxpert/step-stay-stopped-aws-rds-aurora/releases
# Check that the release is immutable!
for_each = toset(["us-east-1", "us-west-2",])
stay_stopped_rds_region = each.key
}For installation in multiple AWS accounts (regardless of the number of
regions), wrapping a CloudFormation StackSet in HashiCorp Configuration
Language remains much easier than configuring Terraform to deploy identical
resources in multiple AWS accounts. The
Multi-Account, Multi-Region
installation instructions include the option to do this using a Terraform
module, at Step 3. This is the //terraform-multi module.
Least-privilege installation details...
You can use a
CloudFormation service role
to delegate only the privileges needed to create the StepStayStoppedRdsAurora
stack. (This is done for you if you use Terraform at Step 3 of
Get Started.)
First, create the StepStayStoppedRdsAuroraPrereq stack from
cloudformation/step_stay_stopped_aws_rds_aurora_prereq.yaml .
Under "Additional settings" → "Stack policy - optional", you can "Upload a
file" and select a locally-saved copy of
cloudformation/step_stay_stopped_aws_rds_aurora_prereq_policy.json .
The stack policy prevents inadvertent replacement or deletion of the deployment
role during stack updates, but it cannot prevent deletion of the entire
StepStayStoppedRdsAuroraPrereq stack.
Next, when you create the StepStayStoppedRdsAurora stack from
cloudformation/step_stay_stopped_aws_rds_aurora.yaml ,
set "Permissions - optional" → "IAM role - optional" to
StepStayStoppedRdsAuroraPrereq-DeploymentRole . If your own privileges
are limited, you might need permission to pass the deployment role to
CloudFormation. See the
StepStayStoppedRdsAuroraPrereq-SampleDeploymentRolePassRolePol IAM policy for
an example.
For a CloudFormation StackSet, you can use
self-managed permissions
by copying the inline IAM policy of
StepStayStoppedRdsAuroraPrereq-DeploymentRole to a customer-managed IAM
policy, attaching your policy to AWSCloudFormationStackSetExecutionRole and
propagating the policy and the role policy attachment to all target AWS
accounts.
If you do not give Terraform full AWS administrative permissions, you must give it permission to:
-
List, describe, get tags for, create, tag, update, untag and delete IAM roles, update the "assume role" (role trust or "resource-based") policy, and put and delete in-line policies
-
Attach managed IAM policies to, and detach them from, roles (if you set
AttachLocalPolicy) -
List, describe, create, tag, update, untag, and delete CloudFormation stacks
-
Set and get CloudFormation stack policies
-
Pass
StepStayStoppedRdsAuroraPrereq-DeploymentRole-*to CloudFormation -
List, describe, and get tags for, all
datasources. To see the data sources, run:grep 'data "' terraform*/*.tf | cut --delimiter=' ' --fields='1,2' | sort | uniq
Open the AWS Service Authorization Reference, go through the list of services on the left, and consult the "Actions" table for each of:
AWS Identity and Access Management (IAM)CloudFormationAWS Security Token ServiceAWS Key Management Service(if you encrypt the SQS queue or the CloudWatch log group, or Step Function data, with KMS keys)AWS Organizations(if you create a CloudFormation StackSet with the//terraform-multimodule)
In most cases, you can scope Terraform's permissions to one workload by regulating resource naming and tagging, and then by using:
- ARN patterns in
Resourcelists - ARN patterns in
Conditionentries - Request tag and then resource tag
Conditionentries
Check Service and Resource Control Policies (SCPs and RCPs), as well as resource policies (such as KMS key policies).
The basic //terraform module creates the StepStayStoppedRdsAuroraPrereq
stack, which defines the IAM role that gives CloudFormation the permissions it
needs to create the StepStayStoppedRdsAurora stack. Terraform itself does not
need the deployment role's permissions.
In accordance with the software license, nothing in this document establishes indemnification, a warranty, assumption of liability, etc. Use this software entirely at your own risk. You are encouraged to review the source code.
Security goals...
-
A least-privilege role for the AWS Step Function.
-
A Step Function role that cannot be used by arbitrary functions. If the role is passed to an arbitrary Step Function, Task states will not gain access to the Aurora and RDS API.
-
A least-privilege queue policy. The error (dead letter) queue can only consume messages from EventBridge. Encryption in transit is required.
-
Optional encryption at rest with the AWS Key Management System, for the error queue, Step Function state machine payloads, and the log. This can protect EventBridge events containing database identifiers and metadata, such as tags. KMS keys housed in a different AWS account, and multi-region keys, are supported.
-
A retry mechanism and a state machine timeout, to increase the likelihood that a database will be stopped as intended but prevent endless retries.
-
A 24-hour event date/time expiry check, to prevent processing of accumulated stale events, if any.
-
Readable Identity and Access Management policies, formatted as CloudFormation YAML rather than JSON (where possible), and broken down into discrete statements by service, resource or principal.
Security actions...
-
Prevent people from modifying components of this tool, most of which can be identified by
StepStayStoppedRdsAurorain ARNs and in the automaticaws:cloudformation:stack-nametag. -
Log infrastructure changes using CloudTrail, and set up alerts.
-
Prevent people from directly invoking the Step Function.
-
Separate production workloads. Although this tool only stops databases that AWS is starting after they've been stopped for 7 days, the Step Function could stop any database if invoked directly, with a contrived event as input. You might choose not to deploy this tool in AWS accounts used for production, or you might add a custom IAM policy to deny the function role permission to stop certain databases. See the
AttachLocalPolicyparameter.- Tagging an RDS database instance or an Aurora database cluster with
StayStopped-Excludeprevents the Step Function role from being misused to stop that database. Requiring an inclusion tag is also possible. See theExcludeTagKeyandIncludeTagKeyparameters. ⚠ Do not rely on attribute-based access control unless you also prevent people and systems from adding, changing and deleting ABAC tags. A sample Service Control Policy is available.
- Tagging an RDS database instance or an Aurora database cluster with
-
Enable the test mode only in a non-critical AWS account and region, and turn the test mode off again as quickly as possible.
-
Monitor the error (dead letter) queue, and monitor the log.
-
Configure budget alerts and use cost anomaly detection.
-
Occasionally start a database before its maintenance window and leave it running, to catch up with RDS and Aurora security updates.
-
If you use Terraform, do not use it with an AWS access key and do not give it full AWS administrative privileges. Instead, follow AWS's Best practices for using the Terraform AWS Provider: Security best practices. Do the extra work of defining a least-privilege IAM role for deploying each workload. Configure Terraform to assume workload-specific roles. The CloudFormation service role is one element, but achieving least-privilege also requires limiting Terraform's privileges.
Protecting database tags...
Step-Stay-Stopped works without database tags. The exclusion/inclusion tagging options are for security, not for function. If you decide to tag your databases, a sample service control policy is available to prevent tampering.
This SCP offers one-way protection: Non-exempt roles can reduce but not increase the range of databases that the Step Function role is allowed to stop. Specifically, non-exempt roles cannot remove the exclusion tag or add the inclusion tag.
In your AWS Organizations management account, in the region where you manage infrastructure-as-code templates for non-regional resources, create a CloudFormation stack from cloudformation/step_stay_stopped_aws_rds_aurora.yaml .
Or, reference the equivalent Terraform module:
module "stay_stopped_rds_scp" {
source = "git::https://github.com/sqlxpert/step-stay-stopped-aws-rds-aurora.git//terraform-scp?ref=v2.4.1"
# Reference a specific version from github.com/sqlxpert/step-stay-stopped-aws-rds-aurora/releases
# Check that the release is immutable!
scp_target_ids = [
"ou-0123-abcdefg",
]
}In either case, specify the number of the account or the ou- ID of the
organizational unit that you use for testing SCPs.
Test the SCP before applying it broadly, because it generally reduces existing RDS/Aurora permissions. Human users or automated processes might rely on those permissions. This is especially true of backup restoration, blue/green deployment, and Aurora cluster scaling workflows, which might copy tags to new databases.
You will need at least one SCP-exempt role in every AWS account, to manage the
exclusion/inclusion tags. I recommend
IAM Identity Center permission sets.
You can customize ScpPrincipalCondition / scp_principal_condition to
reference permission set roles.
The SCP works by denying certain rds:RemoveTagsFromResource and
rds:AddTagsToResource requests. It cannot add permissions that have been
denied by another SCP, or that were never allowed by a role's attached or
inline policies.
SCPs do not affect roles or other IAM principals in the AWS Organizations management account.
Check the:
-
StepStayStoppedRdsAurora-StepFn CloudWatch log group
Rds.InvalidDbInstanceStateExceptionorRds.InvalidDbClusterStateExceptionerrors, with no other proximate errors, are expected and can be ignored.- Log entries are JSON objects.
- For more data, change the
LogLevelparameter.
-
"Executions" data for the
StepStayStoppedRdsAurora-StepFnStep Function- The "State view" is useful for diagnosing errors.
- Rows with "Caught error" in the "Status" column are expected and can be
ignored if the "Reason" is
Rds.InvalidDbInstanceStateExceptionorRds.InvalidDbClusterStateException.
-
StepStayStoppedRdsAurora-ErrorQueue(dead letter) SQS queue- A message here means that the Step Function did not run; the request to stop the database was not made.
- Usually the local security configuration is denying EventBridge necessary access to the Step Function.
-
- CloudTrail events with an "Error code" may indicate permissions problems, typically due to the local security configuration.
- To see more events, change "Read-only" from
falsetotrue.
Testing details...
An RDS database instance ( db.t4g.micro , 20 GiB of gp3 storage,
0 days' worth of automated backups) is cheaper than a typical Aurora
cluster, not to mention faster to create, stop, and start.
AWS starts RDS and Aurora databases that have been stopped for 7 days, but we need a faster mechanism for realistic, end-to-end testing. Temporarily change these parameters:
| Parameter | Normal | Test |
|---|---|---|
Test |
false |
true |
StepFnWaitSeconds |
540 |
60 |
| → Equivalent in minutes | 9 minutes | 1 minute |
StepFnTimeoutSeconds |
86400 |
1800 |
| → Equivalent in hours | 24 hours | ½ hour |
LogLevel |
ERROR |
ALL |
⚠ Exit test mode as quickly as possible, given the operational and
security risks explained below. If your test database is ready, several minutes
should be sufficient. Test mode is always disabled in the //terraform-multi
module.
In test mode, Step-Stay-Stopped also responds to user-initiated, non-forced database starts: RDS-EVENT-0088 (RDS database instance) and RDS-EVENT-0151 (Aurora database cluster). Although it won't stop databases that are already running and remain running, ⚠ while in test mode Step-Stay-Stopped will stop databases that you start manually. To test, manually start a stopped RDS or Aurora database.
In test mode, Step-Stay-Stopped also receives RDS-EVENT-0088 (Aurora database instance). Internally, the Step Function ignores it in favor of the cluster-level event.
Depending on locally-determined permissions, you may also be able to invoke
the StepStayStoppedRdsAurora-StepFn
Step Function
manually. Edit the database names and date/time strings (must be within the
past StepFnTimeoutSeconds and end in Z for
UTC)
in these test inputs:
{
"detail": {
"SourceIdentifier": "Name-Of-Your-RDS-Database-Instance",
"Date": "2025-06-06T04:30Z",
"SourceType": "DB_INSTANCE",
"EventID": "RDS-EVENT-0154"
},
"detail-type": "RDS DB Instance Event",
"source": "aws.rds",
"version": "0"
}{
"detail": {
"SourceIdentifier": "Name-Of-Your-Aurora-Database-Cluster",
"Date": "2025-06-06T04:30Z",
"SourceType": "CLUSTER",
"EventID": "RDS-EVENT-0153"
},
"detail-type": "RDS DB Cluster Event",
"source": "aws.rds",
"version": "0"
}After following the troubleshooting steps and ruling out local issues such as permissions — especially hidden controls such as Service and Resource control policies (SCPs and RCPs) — please report bugs. Thank you!
| Scope | Link | Included Copy |
|---|---|---|
| Source code, and source code in documentation | GNU General Public License (GPL) 3.0 | LICENSE-CODE.md |
| Documentation, including this ReadMe file | GNU Free Documentation License (FDL) 1.3 | LICENSE-DOC.md |
Copyright Paul Marcelin
Contact: marcelin at cmu.edu (replace "at" with @)
