You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Several rows in the README Troubleshooting table describe friction that is Terraform-specific — the same azd up run against the Bicep variant either does not hit the issue at all, or surfaces a much clearer error. Because we ship both variants side-by-side, the disparity is visible to anyone who tries the Terraform path.
Most of these are rooted in upstream provider gaps (and are tracked there — see hashicorp/terraform-provider-azurerm#31140 for the biggest one), so we cannot fully close them in this repo. But for each, there is a meaningful mitigation we can ship at this layer — either a precondition, a CI guard, a default-value alignment, or a clearer in-product error — that would bring the Terraform experience much closer to the Bicep one.
This issue collects all of them in one place so they can be triaged and chipped away at together rather than rediscovered piecemeal.
Concrete items
Each item below cites the troubleshooting row that motivates it, the Bicep behavior (baseline), the current Terraform behavior, and a proposed mitigation that lives in this repo (not upstream).
1. Opaque 400 715-123420 quota error from azapi_resource
Troubleshooting row:"Opaque 400 715-123420 ... on the Terraform deployment step" + "Quota looks full but you have no live deployments".
Bicep: ARM preflight translates the same condition into InsufficientQuota: This operation require N new capacity in quota Tokens Per Minute (thousands) - Claude <model>, which is bigger than the current available capacity X.
Terraform:azapi_resource bypasses ARM preflight and the Cognitive Services RP returns the generic 715-123420 "An error occurred. Please reach out to support for additional assistance." with no hint that quota is the cause.
Already mitigated in this repo: the preprovision hook (scripts/preflight-claude.ps1) runs a quota check and exits 6 with a clear message before azd up ever calls the RP.
Gap: the preflight only runs under azd up. A user who runs terraform apply directly gets the raw opaque error.
Proposed:
Add a terraform_data resource with a precondition block that reads quota via the azapi_resource_action data source (or a local-exec shelling out to az cognitiveservices usage list) and fails the plan if currentValue + requestedCapacity > limit. The plan-time error message would name the variable to lower.
Alternatively, add a postcondition on the deployment resource that pattern-matches 715-123420 in the error string and re-raises it with the same remediation text we already have in README.
2. Soft-deleted Cognitive accounts hold quota for 48 h, manifest as 715-123420
Troubleshooting row:"Quota looks full but you have no live deployments".
Proposed: preflight should list soft-deleted accounts in the target region and warn (not fail) when found, so a fresh user does not have to learn this only after the deployment fails. Pseudo-output:
Warning: 3 soft-deleted Cognitive accounts in eastus2 are holding ~50 TPM of
claude-sonnet-4-6 quota. They will auto-purge in <date>. To free immediately:
az cognitiveservices account purge ...
3. azurerm_cognitive_account / azurerm_cognitive_deployment cannot set allowProjectManagement + modelProviderData
Already mitigated: the Terraform variant uses azapi_resource everywhere and we set all three modelProviderData fields.
Proposed:
Add a CI guard that fails the Terraform validate job if anyone changes infra-terraform/infra/main.tf to drop schema_validation_enabled = false or the three required modelProviderData keys (organizationName, countryCode, industry). The constraint is invisible at terraform validate time and easy to break by accident.
Add a watcher (link / Dependabot-style note) so when the upstream PR lands and azurerm_cognitive_deployment supports modelProviderData natively, we can migrate from azapi_resource and shorten main.tf.
4. Built-in role assignments break silently when Azure renames roles
Troubleshooting row: not currently a row, but documented in PR #2. Roles Azure AI User → Foundry User and Azure AI Project Manager → Foundry Project Manager got renamed by Azure mid-flight.
Bicep: referenced the roles by GUID (53ca6127-..., eadc314b-...) → kept working through the rename.
Proposed: add a one-line CI guard (grep) that fails if any literal Azure built-in role name appears in infra-terraform/infra/*.tf or infra-bicep/infra/*.bicep outside of comments/(formerly ...) parentheticals. Keeps us from regressing.
Actually equal today after our recent alignment — but the Terraform variant has a documented quirk where data-plane role propagation lag manifests differently because the role assignments and the deployment are sequenced differently. The README walkthrough timing notes that the TF deployment is long enough for RBAC to settle in the same azd up run, but on Bicep the deployment is faster so a fresh python src/hello_claude.py may hit the intermittent 401 more often.
Proposed: add a brief variant-parity note in infra-terraform/azure.yaml and the README walkthrough that calls out why the perceived 401 frequency differs across variants, instead of leaving users to discover it.
6. variables.tf defaults silently overridden by main.tfvars.json${VAR=N} literal
Bicep: parameter default lives in main.bicep; main.parameters.json only injects env vars — no second source of truth.
Terraform:variables.tf default is shadowed by the ${VAR=N} literal in main.tfvars.json when the env var is unset. Lowering variables.tf to 25 while leaving main.tfvars.json at 50 produces a silent regression.
Proposed: CI guard that diffs the =N defaults in *.tfvars.json / *.parameters.json against their corresponding variables.tf / main.bicep defaults and fails on mismatch.
Troubleshooting row: not user-facing, but part of the CI section in repo memory and a maintenance hazard.
Bicep: no equivalent — bicep build is self-contained.
Terraform CI: older terraform_version (e.g. 1.6.0) ships with stale embedded PGP keys for the rotated HashiCorp signing key, so terraform init fails with openpgp: key expired on hashicorp/azurerm and hashicorp/random. We currently pin 1.10.0 to dodge this.
Proposed: add a .terraform-version file at repo root (or infra-terraform/.terraform-version) honored by tfenv / setup-terraform, so the same TF version is enforced locally and in CI. Avoids "works in CI, fails locally" / vice versa on the terraform fmt -check drift the README walkthrough already documents.
8. terraform fmt output drifts across TF versions
Troubleshooting row: not a row; an internal CI gotcha from our session memory.
Bicep: N/A.
Terraform: Local TF 1.15.4 and CI 1.10.0 format local blocks differently. Either fails the other's fmt -check.
The root cause of 715-123420 itself (the RP returning a generic code). Filed informally via the threads on hashicorp/terraform-provider-azurerm#31140 and the upstream MS service-team backlog item that @promisinganuj references.
Native modelProviderData support in azurerm_cognitive_deployment. Tracked upstream.
The undocumented modelProviderData REST property in the azure-rest-api-specs Cognitive Services swagger. Documentation gap is acknowledged by the service PM (per the same upstream thread).
Acceptance criteria for closing this issue
Items 1, 2, 3, 4, 6 have either a PR landed or an explicit "wontfix" rationale in a follow-up comment.
Items 5, 7, 8 are addressed by a docs / version-pin PR.
Troubleshooting table updated to reflect any rows that are no longer needed because the underlying friction was eliminated.
What problem would this solve?
Several rows in the README Troubleshooting table describe friction that is Terraform-specific — the same
azd uprun against the Bicep variant either does not hit the issue at all, or surfaces a much clearer error. Because we ship both variants side-by-side, the disparity is visible to anyone who tries the Terraform path.Most of these are rooted in upstream provider gaps (and are tracked there — see hashicorp/terraform-provider-azurerm#31140 for the biggest one), so we cannot fully close them in this repo. But for each, there is a meaningful mitigation we can ship at this layer — either a
precondition, a CI guard, a default-value alignment, or a clearer in-product error — that would bring the Terraform experience much closer to the Bicep one.This issue collects all of them in one place so they can be triaged and chipped away at together rather than rediscovered piecemeal.
Concrete items
Each item below cites the troubleshooting row that motivates it, the Bicep behavior (baseline), the current Terraform behavior, and a proposed mitigation that lives in this repo (not upstream).
1. Opaque
400 715-123420quota error fromazapi_resource400 715-123420 ...on the Terraform deployment step" + "Quota looks full but you have no live deployments".InsufficientQuota: This operation require N new capacity in quota Tokens Per Minute (thousands) - Claude <model>, which is bigger than the current available capacity X.azapi_resourcebypasses ARM preflight and the Cognitive Services RP returns the generic715-123420 "An error occurred. Please reach out to support for additional assistance."with no hint that quota is the cause.preprovisionhook (scripts/preflight-claude.ps1) runs a quota check and exits 6 with a clear message beforeazd upever calls the RP.azd up. A user who runsterraform applydirectly gets the raw opaque error.terraform_dataresource with apreconditionblock that reads quota via theazapi_resource_actiondata source (or alocal-execshelling out toaz cognitiveservices usage list) and fails the plan ifcurrentValue + requestedCapacity > limit. The plan-time error message would name the variable to lower.postconditionon the deployment resource that pattern-matches715-123420in the error string and re-raises it with the same remediation text we already have in README.2. Soft-deleted Cognitive accounts hold quota for 48 h, manifest as
715-123420InsufficientQuota(see Simplify message printing in samples #1), at least pointing the user at quota.3.
azurerm_cognitive_account/azurerm_cognitive_deploymentcannot setallowProjectManagement+modelProviderDatamodelProviderDatamatters" details block.2025-10-01-previewsupport both fields.azapi_resourcewithschema_validation_enabled = false. This is tracked upstream in hashicorp/terraform-provider-azurerm#31140.azapi_resourceeverywhere and we set all threemodelProviderDatafields.infra-terraform/infra/main.tfto dropschema_validation_enabled = falseor the three requiredmodelProviderDatakeys (organizationName,countryCode,industry). The constraint is invisible atterraform validatetime and easy to break by accident.azurerm_cognitive_deploymentsupportsmodelProviderDatanatively, we can migrate fromazapi_resourceand shortenmain.tf.4. Built-in role assignments break silently when Azure renames roles
Azure AI User→Foundry UserandAzure AI Project Manager→Foundry Project Managergot renamed by Azure mid-flight.53ca6127-...,eadc314b-...) → kept working through the rename.Role "Azure AI User" doesn't existuntil we converted to GUIDs.infra-terraform/infra/*.tforinfra-bicep/infra/*.bicepoutside of comments/(formerly ...)parentheticals. Keeps us from regressing.5.
ASSIGN_RBACdefaults differ between variants403 Forbidden" + the granting data-plane roles afterazd upone-liner.main.parameters.json):ASSIGN_RBAC=${ASSIGN_RBAC=false}.main.tfvars.json):ASSIGN_RBAC=${ASSIGN_RBAC=false}.azd uprun, but on Bicep the deployment is faster so a freshpython src/hello_claude.pymay hit the intermittent 401 more often.infra-terraform/azure.yamland the README walkthrough that calls out why the perceived 401 frequency differs across variants, instead of leaving users to discover it.6.
variables.tfdefaults silently overridden bymain.tfvars.json${VAR=N}literalmain.parameters.jsonissue on the Bicep side.main.bicep;main.parameters.jsononly injects env vars — no second source of truth.variables.tfdefault is shadowed by the${VAR=N}literal inmain.tfvars.jsonwhen the env var is unset. Loweringvariables.tfto 25 while leavingmain.tfvars.jsonat 50 produces a silent regression.=Ndefaults in*.tfvars.json/*.parameters.jsonagainst their correspondingvariables.tf/main.bicepdefaults and fails on mismatch.7.
hashicorp/setup-terraform@v3+ TF ≤ 1.9 = expired provider PGP keybicep buildis self-contained.terraform_version(e.g. 1.6.0) ships with stale embedded PGP keys for the rotated HashiCorp signing key, soterraform initfails withopenpgp: key expiredonhashicorp/azurermandhashicorp/random. We currently pin1.10.0to dodge this..terraform-versionfile at repo root (orinfra-terraform/.terraform-version) honored bytfenv/setup-terraform, so the same TF version is enforced locally and in CI. Avoids "works in CI, fails locally" / vice versa on theterraform fmt -checkdrift the README walkthrough already documents.8.
terraform fmtoutput drifts across TF versionslocalblocks differently. Either fails the other'sfmt -check.Out of scope (filed upstream, not solvable here)
715-123420itself (the RP returning a generic code). Filed informally via the threads on hashicorp/terraform-provider-azurerm#31140 and the upstream MS service-team backlog item that@promisinganujreferences.modelProviderDatasupport inazurerm_cognitive_deployment. Tracked upstream.modelProviderDataREST property in theazure-rest-api-specsCognitive Services swagger. Documentation gap is acknowledged by the service PM (per the same upstream thread).Acceptance criteria for closing this issue
Where does this change land?
infra-terraform/) — most items.scripts/preflight-claude.*) — items 1, 2..github/workflows/validate.yml) — items 3, 4, 6, 7, 8.