Skip to content

(feat.) Initial Chat-With-Factory module#633

Open
benljbrooks wants to merge 28 commits into
mainfrom
bbrooks/chat-factory
Open

(feat.) Initial Chat-With-Factory module#633
benljbrooks wants to merge 28 commits into
mainfrom
bbrooks/chat-factory

Conversation

@benljbrooks

Copy link
Copy Markdown

Pull Request

Description

Adds a new application component, 513-chat-with-your-factory: a voice-enabled web chat application that lets operators ask natural-language questions about factory equipment and get answers grounded in live industrial data. It runs standalone today and is built to also run embedded in Microsoft Teams (Teams SSO + a tab manifest template are included but not yet deployed or validated in Teams — that is a planned follow-up).

The service is a Node.js/TypeScript (ESM) Express app (port 3978) with a React client, pluggable agent backends, an optional Teams SSO layer, a Microsoft Fabric–backed grounding tool, and Helm charts for Kubernetes deployment.

Highlights:

  • Pluggable agent backends via AGENT_BACKEND: Copilot Studio Agents SDK (default, OBO token exchange), Azure AI Foundry, and Copilot Studio Direct Line.
  • Factory ontology grounding tool (query_factory_ontology) that queries a Microsoft Fabric lakehouse over mssql using Entra token auth (scope https://database.windows.net/.default).
  • Teams-ready (not yet Teams-deployed): @microsoft/teams-js client integration, Teams SSO/JWT validation (jose) with on-behalf-of token exchange, and a Teams app manifest template.
  • Helm charts, provisioning scripts, and full setup/operations documentation.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Blueprint modification or addition
  • Component modification or addition
  • Documentation update
  • CI/CD pipeline change
  • Other (please describe):

Implementation Details

  • Component layout under src/500-application/513-chat-with-your-factory/: service code (services/chat-with-your-factory/), Helm charts (charts/), Teams app manifest template (appManifest/), provisioning scripts (scripts/), and docs (docs/).
  • Server (src/server/): Express handlers, agent-backend adapters, optional Teams JWT auth middleware (bypassed in standalone mode via AUTH_REQUIRED=false/SKIP_AUTH=true), and the query_factory_ontology factory tool with a Fabric lakehouse client (ontologyClient.ts) using Entra access tokens.
  • Client (src/client/): React UI with chat hooks/components, plus @microsoft/teams-js integration for theme/user/SSO when embedded in Teams.
  • Config is environment-driven; .env.template (at the component root) documents all variables. No secrets are committed (.env* is gitignored except the template).
  • One example added under src/000-cloud/033-fabric-ontology/definitions/examples/ for the ontology mapping consumed by the tool.
  • Root .gitignore updated to exclude local tooling/IDE artifacts.

Testing Performed

  • Terraform plan/apply

  • Blueprint deployment test

  • Unit tests

  • Integration tests

  • Bug fix includes regression test (see Test Policy)

  • Manual validation

  • Other:

  • Unit tests: vitest suite passing (6/6) in factoryTool.test.ts, covering the query_factory_ontology tool's input schema, query construction, and result shaping. Other modules (agent backends, auth middleware, client) are not yet covered by automated tests.

  • Manual validation (standalone UI): ran the server in standalone mode (AUTH_REQUIRED=false); the React landing page renders and is interactive — session controls, suggestion chips (Summarize open work orders / Recent alerts / Compare machine downtime / Draft a work order), and the mic + text-input bar. Screenshot below.

  • Not exercised in this run: agent-backend round-trips, Teams SSO, and live Fabric lakehouse queries were not invoked during standalone validation.

Validation Steps

  1. cd src/500-application/513-chat-with-your-factory/services/chat-with-your-factory && npm ci
  2. Run unit tests: npm test (expect 6/6 passing).
  3. At the component root (src/500-application/513-chat-with-your-factory/), copy .env.template to .env and populate per docs/setup-guide.md (agent backend, Fabric lakehouse). Set AUTH_REQUIRED=false for standalone validation.
  4. From the service dir, npm start (builds, then runs with --env-file ../../.env). The server logs Chat With Factory server running on http://localhost:3978.
  5. Open http://localhost:3978/; the landing page renders. (Agent/Fabric responses require a configured backend and Fabric lakehouse.)

Checklist

  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed
  • I have run terraform fmt on all Terraform code — N/A (no Terraform changes)
  • I have run terraform validate on all Terraform code — N/A (no Terraform changes)
  • I have run az bicep format on all Bicep code — N/A (no Bicep changes)
  • I have run az bicep build to validate all Bicep code — N/A (no Bicep changes)
  • I have checked for any sensitive data/tokens that should not be committed
  • Lint checks pass (run applicable linters for changed file types)

Security Review

  • No credentials, secrets, or tokens are hardcoded or logged — config is env-driven; .env* gitignored (template only). Only well-known public Microsoft first-party IDs (Graph, User.Read scope, Teams web/desktop clients, Power Platform API) appear in docs, annotated as such.
  • RBAC and identity changes follow least-privilege principles — optional Teams SSO with OBO token exchange; Fabric access via scoped Entra tokens (https://database.windows.net/.default).
  • No new network exposure or public endpoints introduced without justification — service listens on 3978 behind optional auth; Helm-deployed.
  • Dependency additions or updates have been reviewed for known vulnerabilities
  • Container image changes use pinned digests or SHA references — N/A (no image digest changes in this PR)

Additional Notes

  • Scope: ~75 files within the new 513-chat-with-your-factory component (plus one fabric-ontology example and a .gitignore update).
  • Teams deployment is out of scope for this PR. Running embedded in Teams (filling the manifest template, hosting on a public/dev-tunnel URL, registering the Entra app for SSO, sideloading, and running with AUTH_REQUIRED=true), Kubernetes deployment, and live agent/Fabric validation are planned follow-ups.

Ben Brooks added 6 commits June 15, 2026 17:25
Add the query_factory_ontology Foundry tool to 513-chat-with-your-factory:
ontologyClient (mssql Fabric lakehouse reads), factoryTool dispatch, agent provisioning, Helm/deploy env wiring, and grounding/live-data docs. Validated end-to-end against a real Fabric lakehouse (T2) and through the Foundry agent (T3). Harden env gitignore to .env.* (keep .env.template), ignore megalinter-reports/ and *.lscache, and remove stray tracked C# cache files.
The .hve-core gitlink (mode 160000, no .gitmodules) was a local HVE-Core
tooling install swept into the migration commit. Untrack it and ignore
.hve-core/ so the dangling submodule reference does not reach main.
@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown

Dependency Review

The following issues were found:

  • ❌ 1 vulnerable package(s)
  • ✅ 0 package(s) with incompatible licenses
  • ✅ 0 package(s) with invalid SPDX license definitions
  • ✅ 0 package(s) with unknown licenses.
  • ⚠️ 55 packages with OpenSSF Scorecard issues.

View full job summary

@github-actions

Copy link
Copy Markdown

📚 Documentation Health Report

Generated on: 2026-06-24 15:26:45 UTC

📈 Documentation Statistics

Category File Count
Main Documentation 222
Infrastructure Components 205
Blueprints 39
GitHub Resources 26
AI Assistant Guides (Copilot) 17
Total 509

🏗️ Three-Tree Architecture Status

  • ✅ Bicep Documentation Tree: Auto-generated navigation
  • ✅ Terraform Documentation Tree: Auto-generated navigation
  • ✅ README Documentation Tree: Manual README organization

🔍 Quality Metrics

  • Frontmatter Validation:
    success
  • Link Validation: success

This report is automatically generated by the Documentation Automation workflow.

@github-actions

Copy link
Copy Markdown

📚 Documentation Health Report

Generated on: 2026-06-24 15:31:11 UTC

📈 Documentation Statistics

Category File Count
Main Documentation 222
Infrastructure Components 205
Blueprints 39
GitHub Resources 26
AI Assistant Guides (Copilot) 17
Total 509

🏗️ Three-Tree Architecture Status

  • ✅ Bicep Documentation Tree: Auto-generated navigation
  • ✅ Terraform Documentation Tree: Auto-generated navigation
  • ✅ README Documentation Tree: Manual README organization

🔍 Quality Metrics

  • Frontmatter Validation:
    success
  • Link Validation: success

This report is automatically generated by the Documentation Automation workflow.

- Remove duplicate H1 (MD025) and rename duplicate Copilot Studio headings (MD024)

- Format markdown tables to satisfy markdown-table-formatter

- Add product terms (Copilot Studio, Dev Tunnels, Direct Line, Bot Framework, Voice Live, Web Speech, sideload) to cspell words

- Replace example dev tunnel ID to avoid cspell false positive
@github-actions

Copy link
Copy Markdown

📚 Documentation Health Report

Generated on: 2026-06-24 16:37:28 UTC

📈 Documentation Statistics

Category File Count
Main Documentation 222
Infrastructure Components 205
Blueprints 39
GitHub Resources 26
AI Assistant Guides (Copilot) 17
Total 509

🏗️ Three-Tree Architecture Status

  • ✅ Bicep Documentation Tree: Auto-generated navigation
  • ✅ Terraform Documentation Tree: Auto-generated navigation
  • ✅ README Documentation Tree: Manual README organization

🔍 Quality Metrics

  • Frontmatter Validation:
    success
  • Link Validation: success

This report is automatically generated by the Documentation Automation workflow.

Ben Brooks added 2 commits June 24, 2026 18:16
… root eslint

The service has its own eslint.config.js importing typescript-eslint and
globals, which are only installed in the service's node_modules. Root CI runs
npx eslint . with only root deps, so traversing into the service crashed with
ERR_MODULE_NOT_FOUND. Ignore the service subtree at the repo root; it is linted
with its own config locally.
Transitive dependency of express@5 via router@2. The flagged 8.3.0 is
vulnerable to ReDoS via sequential optional groups (CVE-2026-4926, high).
Patched in 8.4.0; lockfile resolves to 8.4.2 under router's ^8.0.0 range.
@github-actions

Copy link
Copy Markdown

📚 Documentation Health Report

Generated on: 2026-06-24 18:33:14 UTC

📈 Documentation Statistics

Category File Count
Main Documentation 222
Infrastructure Components 205
Blueprints 39
GitHub Resources 26
AI Assistant Guides (Copilot) 17
Total 509

🏗️ Three-Tree Architecture Status

  • ✅ Bicep Documentation Tree: Auto-generated navigation
  • ✅ Terraform Documentation Tree: Auto-generated navigation
  • ✅ README Documentation Tree: Manual README organization

🔍 Quality Metrics

  • Frontmatter Validation:
    success
  • Link Validation: success

This report is automatically generated by the Documentation Automation workflow.

CodeQL js/missing-rate-limiting flagged the authorized /api middleware and the
SPA fallback file-serving handler. Add express-rate-limit and apply a global
limiter after the unauthenticated health probe so the authorized API routes and
static file fallback are throttled. Set trust proxy=1 so the limiter keys on the
real client IP behind the App Service reverse proxy.
@github-actions

Copy link
Copy Markdown

📚 Documentation Health Report

Generated on: 2026-06-24 19:07:08 UTC

📈 Documentation Statistics

Category File Count
Main Documentation 222
Infrastructure Components 205
Blueprints 39
GitHub Resources 26
AI Assistant Guides (Copilot) 17
Total 509

🏗️ Three-Tree Architecture Status

  • ✅ Bicep Documentation Tree: Auto-generated navigation
  • ✅ Terraform Documentation Tree: Auto-generated navigation
  • ✅ README Documentation Tree: Manual README organization

🔍 Quality Metrics

  • Frontmatter Validation:
    success
  • Link Validation: success

This report is automatically generated by the Documentation Automation workflow.

@github-actions

Copy link
Copy Markdown

📚 Documentation Health Report

Generated on: 2026-06-25 15:00:39 UTC

📈 Documentation Statistics

Category File Count
Main Documentation 222
Infrastructure Components 205
Blueprints 39
GitHub Resources 26
AI Assistant Guides (Copilot) 17
Total 509

🏗️ Three-Tree Architecture Status

  • ✅ Bicep Documentation Tree: Auto-generated navigation
  • ✅ Terraform Documentation Tree: Auto-generated navigation
  • ✅ README Documentation Tree: Manual README organization

🔍 Quality Metrics

  • Frontmatter Validation:
    success
  • Link Validation: success

This report is automatically generated by the Documentation Automation workflow.

@github-actions

Copy link
Copy Markdown

📚 Documentation Health Report

Generated on: 2026-06-25 16:33:54 UTC

📈 Documentation Statistics

Category File Count
Main Documentation 222
Infrastructure Components 206
Blueprints 37
GitHub Resources 26
AI Assistant Guides (Copilot) 17
Total 508

🏗️ Three-Tree Architecture Status

  • ✅ Bicep Documentation Tree: Auto-generated navigation
  • ✅ Terraform Documentation Tree: Auto-generated navigation
  • ✅ README Documentation Tree: Manual README organization

🔍 Quality Metrics

  • Frontmatter Validation:
    success
  • Link Validation: success

This report is automatically generated by the Documentation Automation workflow.

@benljbrooks benljbrooks marked this pull request as ready for review June 25, 2026 18:06
@benljbrooks benljbrooks requested a review from a team June 25, 2026 18:06
# (synthesizes a dev user — insecure, non-production). Otherwise the server
# requires TEAMS_APP_ID and TEAMS_AUTH_AUDIENCES and will fail to start without them.
AUTH_REQUIRED: "false"
# SKIP_AUTH="true" only bypasses auth for true loopback traffic, not in-cluster requests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default AUTH_REQUIRED: "false" disables authentication for all requests regardless of network origin. Any operator who installs the chart without explicitly overriding this value deploys without auth in a cluster environment. Unlike SKIP_AUTH=true (which is restricted to loopback connections), AUTH_REQUIRED=false applies to all traffic reaching the pod.

Consider removing this line from the defaults so the chart fails closed, and leaving it as a commented-out example that operators must consciously opt into:

# Set to "false" for a Teams-less standalone/demo deployment ONLY.
# When unset, TEAMS_APP_ID and TEAMS_AUTH_AUDIENCES are required.
# AUTH_REQUIRED: "false"

@@ -0,0 +1,42 @@
FROM node:22-slim AS builder

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every other Dockerfile in the src/500-application/ directory pins base images to a full digest (image:tag@sha256:...). Both stages here use the floating node:22-slim tag, which means the next build silently picks up whatever is current upstream with no review gate.

Please pin both stages to the current digest, for example:

FROM node:22-slim@sha256:<digest> AS builder

Run docker pull node:22-slim && docker inspect node:22-slim --format '{{index .RepoDigests 0}}' to resolve the digest at the time of merge.

path: /api/healthz
port: http
initialDelaySeconds: 5
periodSeconds: 10

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Dockerfile correctly switches to a non-root appuser, but the Kubernetes manifest does not enforce this at the pod level. Without a securityContext, a cluster that does not enforce Pod Security Admission Restricted profile will allow the container to run as root if the image is changed or overridden.

Adding explicit constraints makes the manifest self-enforcing regardless of cluster policy:

          securityContext:
            runAsNonRoot: true
            allowPrivilegeEscalation: false
            capabilities:
              drop: ["ALL"]

@@ -0,0 +1,54 @@
---
# Default values for chat-with-your-factory
replicaCount: 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sessionStore, sseRegistry, sseTicketStore, and wsTicketStore are all process-local Map instances. If replicaCount is scaled above 1, a session created on one pod is invisible to another, and SSE/WebSocket connections land on whichever pod the load balancer selects.

The default of 1 is correct, but nothing prevents an operator from bumping this value. A comment here would make the constraint self-documenting:

# Session state is in-memory and process-local. replicaCount MUST remain 1.
# Scaling beyond 1 replica requires an external session store (e.g. Redis)
# and is not supported in this release.
replicaCount: 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Foundry run-polling loop has no timeout or max-iteration guard. If a run stalls in queued or in_progress indefinitely (quota pause, upstream hang), the HTTP connection is held open indefinitely, blocking a server worker and accumulating sockets until resources are exhausted. Express 5 has no built-in request timeout.

Two improvements together:

  1. Move the status set to module scope as a ReadonlySet (prevents mutation, avoids recreating the array on every request).
  2. Add a deadline guard inside the loop.
// Module scope
const POLL_STATUSES: ReadonlySet<string> = new Set(['queued', 'in_progress', 'requires_action'])
const MAX_POLL_MS = 120_000

// Inside dispatchChat, Foundry branch
const pollDeadline = Date.now() + MAX_POLL_MS

while (POLL_STATUSES.has(run.status)) {
  if (Date.now() > pollDeadline) {
    throw new Error('Agent run timed out after 120 s')
  }
  await new Promise(r => setTimeout(r, 1000))
  run = await agentsClient.runs.get(threadId, run.id)
  ...
}


PORT=${PORT:-3978}
AUTH_REQUIRED=${AUTH_REQUIRED:-false}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AUTH_REQUIRED defaults to false here, so running this script without pre-setting the variable generates a .env with authentication disabled. Combined with the Helm values default, both the script-based and chart-based setup paths point toward auth-disabled as the path of least resistance.

Changing the default to true means operators must explicitly opt out:

AUTH_REQUIRED=${AUTH_REQUIRED:-true}

@katriendg katriendg left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution — the Chat With Factory component is a genuinely useful addition to the repo. The pluggable agent backends, the factory ontology grounding tool, the Teams SSO path, and the graceful shutdown handling all reflect careful engineering. The Foundry polling loop and the SSE ticket design in particular are well thought through.

A few things need to be addressed before this can merge:

Blocking

  • The component folder must be renamed from 513-chat-with-your-factory to 516-chat-with-your-factory513 is already taken by 513-tiered-notification-service.

Required

  • All ms.date frontmatter fields across the 8 documentation files need updating to today's date — full file list in the reply there).
  • The docs/ subfolder structure is out of step with every other component in src/500-application/; please consolidate into the root README.md, keeping only factory-tool-grounding.md and factory-tool-live-data.md as standalone references (see inline on docs/development.md).
  • The README is missing explicit references to the edge-ai blueprints and components that operators need to provision first (see inline on README.md).
  • image.tag: "latest" with pullPolicy: Always in values.yaml is a supply-chain anti-pattern — please change to tag: "" and pullPolicy: IfNotPresent (see inline).
  • There are also several open threads from @rezatnoMsirhC (AUTH_REQUIRED defaults, unpinned Dockerfile base image, missing pod securityContext, replicaCount comment, Foundry polling timeout) that need responses.

Minor

  • generate-env-config.sh should use set -euo pipefail to be consistent with the other two scripts in the same directory (see inline).
  • namespace: default should be changed to a dedicated namespace (see inline).
  • The trust proxy comment references App Service but the primary deployment path is Kubernetes — a small wording update keeps it accurate for both (see inline).

Happy to discuss any of these. Looking forward to seeing this merged once the items above are addressed!

image:
repository: your-registry.azurecr.io/chat-with-your-factory
pullPolicy: Always
tag: "latest"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image.tag: "latest" Is a Supply-Chain Anti-Pattern

# Current
image:
  repository: your-registry.azurecr.io/chat-with-your-factory
  pullPolicy: Always
  tag: "latest"

tag: "latest" combined with pullPolicy: Always means every pod restart silently pulls whatever image is currently tagged latest — including unreviewed changes — with no deployment audit trail. This is in direct tension with the open comment on this PR about pinning the Dockerfile base image to a digest: the application image itself needs equal rigour.

# Suggested
image:
  repository: your-registry.azurecr.io/chat-with-your-factory
  pullPolicy: IfNotPresent
  tag: ""  # Set to a specific semver tag or digest at deploy time

With tag: "", Helm fails at render time if the operator does not supply --set image.tag=<value>, which forces an explicit, auditable version at every deployment.

# Creates a .env file with all required environment variables.
# Uses current environment variables as values, with defaults for unset variables.

set -e

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing set -u and pipefail

# Current
set -e

# Suggested — consistent with the other two scripts in this directory
set -euo pipefail

deploy-chat-with-your-factory.sh and provision-foundry-agent.sh both already use set -euo pipefail. The -u flag catches unbound variable references early; pipefail propagates failures through pipelines. Please align all three scripts.

@katriendg katriendg Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prerequisites: Please Add Edge-AI Blueprint References

This component depends on several other edge-ai components/blueprints that operators need to provision first. Please add an explicit prerequisites block referencing them, following the pattern in 507-ai-inference/README.md.

Suggested prerequisites block:

## Prerequisites

- Azure subscription with sufficient quota
- Edge cluster, ACR, and Azure AI Foundry deployed via the
  [full-multi-node-cluster](../../../blueprints/full-multi-node-cluster/) blueprint
- Microsoft Fabric workspace deployed via the
  [fabric](../../../blueprints/fabric/) blueprint
- CORA/CORAX ontology provisioned into that workspace via
  [033-fabric-ontology](../../000-cloud/033-fabric-ontology/README.md)
- `kubectl` and `helm` installed locally

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs Structure Not Aligned With Repository Convention

Every other application component under src/500-application/ uses a single README.md at the component root — no docs/ subfolder:

Component Docs approach
503-media-capture-service README.md only
507-ai-inference README.md only
509-sse-connector README.md only
513-tiered-notification-service README.md only

This PR introduces 7 separate files under docs/. That creates navigation friction and breaks the established pattern.

Suggested consolidation:

Current file Disposition
docs/development.md Merge into README.md as ## Development
docs/setup-guide.md Merge into README.md as ## Setup
docs/architecture.md Merge into README.md as ## Architecture
docs/DOCKER_COMPOSE_README.md Merge into README.md as ## Local Development (Docker Compose)
docs/HELM_CHART_GUIDE.md Merge into README.md as ## Kubernetes Deployment (Helm)
docs/factory-tool-grounding.md ✅ Keep — tool-specific reference
docs/factory-tool-live-data.md ✅ Keep — future-path spec

The .env.template approach for configuration documentation is already well-aligned — no change needed there.

nameOverride: ""
fullnameOverride: ""

namespace: default

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namespace: default — Use a Dedicated Namespace

Deploying application workloads into the default namespace makes it harder to scope RBAC policies, NetworkPolicies, and resource quotas to this workload.

# Suggested
namespace: chat-with-your-factory  # Operators may override via --set namespace=<value>

Please also add a one-liner to the Helm deployment guide noting that namespace creation is a prerequisite:

kubectl create namespace chat-with-your-factory

// App Service terminates TLS at a reverse proxy, so the real client IP arrives
// in X-Forwarded-For. Trust the first hop so the rate limiter keys on the
// caller rather than the proxy.
app.set('trust proxy', 1)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trust proxy Comment References App Service; Deployment Target Is Kubernetes

The comment says "App Service terminates TLS" but the documented deployment path is Kubernetes via the Helm chart. The value of 1 is likely correct for both environments (AKS + ingress-nginx also uses a single proxy hop), but the comment is misleading for operators deploying to Kubernetes without App Service.

// Suggested
// A reverse proxy (App Service or Kubernetes Ingress) terminates TLS and
// forwards the real client IP in X-Forwarded-For. Trust one hop so the rate
// limiter keys on the caller rather than the proxy address.
app.set('trust proxy', 1)

@katriendg

katriendg commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Supplementing my ms.date comment: all 8 documentation files need updating to 2026-06-29:

File Current value
README.md 2026-04-29
docs/development.md 2026-04-15
docs/setup-guide.md 2026-04-15
docs/architecture.md 2026-04-15
docs/DOCKER_COMPOSE_README.md 2026-05-06
docs/HELM_CHART_GUIDE.md 2026-05-06
docs/factory-tool-grounding.md 2026-06-15
docs/factory-tool-live-data.md 2026-06-15

Ben Brooks added 3 commits June 29, 2026 18:03
…faults

Remove the active AUTH_REQUIRED="false" default from the chart so an operator
who installs without overrides deploys WITH auth required. Leave it as a
commented opt-in for Teams-less standalone/demo only, noting it disables auth for
all in-cluster traffic (not just loopback). Addresses PR #633 review feedback.
Both builder and runtime stages used the floating node:22-slim tag, so rebuilds
could silently pull a new upstream image. Pin both to the current digest, matching
the digest-pinning convention used by every other src/500-application Dockerfile.
Addresses PR #633 review feedback.
…ontext

The Dockerfile runs as appuser but the Deployment relied on cluster Pod Security
Admission to prevent running as root. Add an explicit container securityContext
(runAsNonRoot, allowPrivilegeEscalation=false, drop ALL capabilities) so the
manifest is self-enforcing regardless of cluster policy. Addresses PR #633 review
feedback.
Ben Brooks added 11 commits June 29, 2026 18:03
Session, SSE registry, and ticket stores are process-local in-memory Maps, so
running more than one replica breaks session affinity. Add a comment on
replicaCount making the constraint self-documenting. Addresses PR #633 review
feedback.
…dline

The run-polling loop had no timeout, so a stalled run (quota pause, upstream
hang) would hold the HTTP socket and worker open indefinitely. Add a 120 s
deadline guard and hoist the poll-status set to a module-scope ReadonlySet.
Addresses PR #633 review feedback.
… generator

The env-config generator defaulted AUTH_REQUIRED to false, so running it without
pre-setting the variable produced a .env with auth disabled. Flip the default to
true so operators must explicitly opt out. Addresses PR #633 review feedback.
…hart

Replace image.tag=latest + pullPolicy=Always (which silently pulls unreviewed
images on every restart) with an empty default and pullPolicy=IfNotPresent. Make
the image helper 'required' so rendering fails unless the operator supplies an
explicit, auditable tag/digest at deploy time. Addresses PR #633 review feedback.
…ipefail

Align generate-env-config.sh with the other two scripts in the directory: add
-u (catch unbound variables) and pipefail (propagate pipeline failures).
Addresses PR #633 review feedback.
…README

Add a top-level Prerequisites section referencing the upstream blueprints and
components operators must provision first (full-multi-node-cluster, fabric,
033-fabric-ontology), following the 507-ai-inference pattern. Rename the
tool-specific subsection to 'Tool Prerequisites' to avoid a duplicate heading.
Addresses PR #633 review feedback.
Merge the five general docs (setup-guide, architecture, development,
DOCKER_COMPOSE_README, HELM_CHART_GUIDE) into README.md as top-level sections,
aligning with the single-README convention used by every other
src/500-application component. Keep the two tool-specific references
(factory-tool-grounding, factory-tool-live-data). Demote merged headings to
keep them unique (MD024), unlink the pre-existing broken ADR references, and fix
the Helm example to use an explicit image tag. Addresses PR #633 review feedback.
Change the chart default namespace from 'default' to 'chat-with-your-factory' so
RBAC, NetworkPolicies, and quotas can be scoped to this workload; operators may
override via --set namespace. Update the README Helm commands to match and add a
kubectl create namespace prerequisite note. Addresses PR #633 review feedback.
…ernetes

The trust proxy comment referenced only App Service, but the documented
deployment path is Kubernetes via the Helm chart. Reword to cover both reverse
proxies (App Service or Kubernetes Ingress). Addresses PR #633 review feedback.
Update the ms.date front-matter on the surviving docs (README, factory-tool-
grounding, factory-tool-live-data). The other five files in the reviewer's list
were consolidated into README and no longer exist. Addresses PR #633 review feedback.
513 is already used by 513-tiered-notification-service. Rename the component
directory to 516-chat-with-your-factory and update the two README self-references
and the root eslint ignore path. Addresses PR #633 blocking review feedback.
@github-actions

Copy link
Copy Markdown

📚 Documentation Health Report

Generated on: 2026-06-29 20:28:09 UTC

📈 Documentation Statistics

Category File Count
Main Documentation 222
Infrastructure Components 202
Blueprints 37
GitHub Resources 26
AI Assistant Guides (Copilot) 17
Total 504

🏗️ Three-Tree Architecture Status

  • ✅ Bicep Documentation Tree: Auto-generated navigation
  • ✅ Terraform Documentation Tree: Auto-generated navigation
  • ✅ README Documentation Tree: Manual README organization

🔍 Quality Metrics

  • Frontmatter Validation:
    success
  • Link Validation: success

This report is automatically generated by the Documentation Automation workflow.

@github-actions

Copy link
Copy Markdown

📚 Documentation Health Report

Generated on: 2026-06-29 21:05:55 UTC

📈 Documentation Statistics

Category File Count
Main Documentation 222
Infrastructure Components 202
Blueprints 37
GitHub Resources 26
AI Assistant Guides (Copilot) 17
Total 504

🏗️ Three-Tree Architecture Status

  • ✅ Bicep Documentation Tree: Auto-generated navigation
  • ✅ Terraform Documentation Tree: Auto-generated navigation
  • ✅ README Documentation Tree: Manual README organization

🔍 Quality Metrics

  • Frontmatter Validation:
    success
  • Link Validation: success

This report is automatically generated by the Documentation Automation workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants