How to Refactor Terraform Resources Without Downtime

While adding SNS publish permissions to the checkout-api as part of a FIFO SNS/SQS fanout setup, I cleaned up its Terraform resource definition along the way. The IAM role had been created conditionally using count — an old pattern that made sense when S3 access was optional, but now that the role is always needed, the condition was no longer applicable.

Removing count is a one-line change in HCL. But Terraform tracks resources by their state address, and removing count changes that address from aws_iam_role.checkout_api[0] to aws_iam_role.checkout_api. Without telling Terraform about this, it plans to destroy the existing role and create a new one. This means downtime and broken pods in production.

The fix is moved blocks.

The Original Code

The role was only created when the S3 bucket was configured:

locals {
  checkout_api_s3_enabled = contains(keys(var.s3_origins), "checkout-api-files")
}

resource "aws_iam_role" "checkout_api" {
  count              = local.checkout_api_s3_enabled ? 1 : 0
  name               = "${var.env}-checkout-api"
  assume_role_policy = data.aws_iam_policy_document.checkout_api_assume.json
}

resource "aws_iam_role_policy" "checkout_api" {
  count  = local.checkout_api_s3_enabled ? 1 : 0
  name   = "${var.env}-checkout-api-s3-readwrite"
  role   = aws_iam_role.checkout_api[0].id
  policy = data.aws_iam_policy_document.checkout_api_s3[0].json
}

resource "aws_eks_pod_identity_association" "checkout_api" {
  count           = local.checkout_api_s3_enabled ? 1 : 0
  cluster_name    = module.eks.cluster_name
  namespace       = "backend"
  service_account = "checkout-api-sa"
  role_arn        = aws_iam_role.checkout_api[0].arn
}

The Refactored Code

Removing count — and renaming the policy resource from checkout_api to checkout_api_s3 to make its scope explicit now that the role will have multiple inline policies:

resource "aws_iam_role" "checkout_api" {
  name               = "${var.env}-checkout-api"
  assume_role_policy = data.aws_iam_policy_document.checkout_api_assume.json
}

resource "aws_iam_role_policy" "checkout_api_s3" {
  name   = "${var.env}-checkout-api-s3-readwrite"
  role   = aws_iam_role.checkout_api.id
  policy = data.aws_iam_policy_document.checkout_api_s3.json
}

resource "aws_eks_pod_identity_association" "checkout_api" {
  cluster_name    = module.eks.cluster_name
  namespace       = "backend"
  service_account = "checkout-api-sa"
  role_arn        = aws_iam_role.checkout_api.arn
}

Three state addresses changed:

Old	New
`aws_iam_role.checkout_api[0]`	`aws_iam_role.checkout_api`
`aws_eks_pod_identity_association.checkout_api[0]`	`aws_eks_pod_identity_association.checkout_api`
`aws_iam_role_policy.checkout_api[0]`	`aws_iam_role_policy.checkout_api_s3`

Without moved blocks, terraform plan shows three destroys and three creates.

The `moved` Blocks

# Temporary — remove after this is applied up to prod.

moved {
  from = aws_iam_role.checkout_api[0]
  to   = aws_iam_role.checkout_api
}

moved {
  from = aws_eks_pod_identity_association.checkout_api[0]
  to   = aws_eks_pod_identity_association.checkout_api
}

moved {
  from = aws_iam_role_policy.checkout_api[0]
  to   = aws_iam_role_policy.checkout_api_s3
}

With these in place, terraform plan shows no destroys — just state moves. The third block is the most instructive: it handles both a count removal and a rename in a single step.

The Two-Commit Workflow

moved blocks are temporary by nature. Once applied across all environments they must be removed — leaving them in permanently creates confusion and can interfere with future refactors.

Commit 1 — add the refactored code and the moved blocks together. The comment makes the temporary nature explicit. Apply this across all environments (dev → staging → prod).

Commit 2 — delete the moved blocks. Nothing else changes.

This keeps the git history clean: commit 1 shows the intent, commit 2 is an unambiguous cleanup with no noise.

Why Not `terraform state mv`?

The alternative is running terraform state mv manually for each address change. It works, but has two problems: it mutates remote state directly without leaving a code review trail, and it must be run once per environment. moved blocks are declarative, reviewable, and applied automatically as part of the normal plan/apply cycle.

The Original Code

The Refactored Code

The moved Blocks

The Two-Commit Workflow

Why Not terraform state mv?

The `moved` Blocks

Why Not `terraform state mv`?