Caching Rework, Timeouts & Memory Improvements by FriedrichWeinmann · Pull Request #1317 · microsoft/zerotrustassessment

FriedrichWeinmann · 2026-06-16T17:43:37Z

Alright, this took a bit longer than expected, but it's finally there:
A major update to how data is persisted within memory to help with large tenant setups and require less configuration. Should work better out-of-the-box.

Changes

Upd: Raised minimum PSFramework version, due to feature usage
Upd: Migrated test timeouts for parallelized tests to Runspace Workflow timeouts, enabling activity-based timeouts
Upd: Added support to retry failed tests
Upd: Migrated Caches to PSFCache - limiting data retention by count and time
Upd: Added soft-limit for Test Statistics messages
More minor tweaks

Test Timeout update

This update only applies to the tests executed in the background.
It uses the latest PSFramework update to the Runspace Workflows to support timeouts. The key difference to the previous implementation is that these timeouts can be based on activity, rather than "since the start". Activity is measured whenever you use Write-PSFMessage.

As the tests running in the main runspace cannot use this feature, the test timeout is kept at the previous one hour by default.
A future update will address that if possible.

Retry Failed Tests

This update only applies to the tests executed in the background.
It uses the latest PSFramework update to the Runspace Workflows to support retries on test failure. The number of retries can be defined via configuration and defaults to 0 extra attempts.
A future update will support customizing this count on a per-test basis.

Note: In the implementation I retry once more than specified. This is so a last round of the retry validation can process the "test failed" handling.

Migrated Caches to PSFCache

We cache all requests to Graph or Azure unless configured otherwise.
This can add a huge load in large tenants.

The new system will automatically expire values too old or when too many values are cached.
Both Azure and Graph request caches default to 30 minutes of age or 50,000 entries, whichever happens first. All limits can be customized via configuration.

Added soft-limit for Test Statistics messages

When you run Get-ZtTestStatistics, you can collect information about the last test execution.
This includes all log messages generated.

Since some of our tests have scaling message output based on tenant size, this could lead to lots of memory being used by these messages.
With this new configuration setting, once too many messages are logged for a test - 1024 by default - the data retained is implified significantly to reduce memory overhead, at the loss of some data.

Warning: Untested

Due to technical limitations I am currently unable to test this code on my own. Given the pressing nature of the large tenant performance, I chose to push this now, rather than wait until I can make the time to restore my own testing capability.

Do not merge right away without testing it first!!!

Thoughts on Cache Changes & Paged Requests

As part of the design change, I considered whether this change will lead to issues with our habit of caching paged requests separately, one cache entry per page.
This might theoretically lead to issues with partial cache invalidation.

In case you want to consider the same issues, here's the potential issues my mind went to and my conclusions on them:

When only part of a paged request is invalidated from cache, will results become inconsistent?

No.
Whether by max items or by lifetime, cache content is invalidated from the oldest entries.
In this situation, the first request is never going to be from cache, its "next page" link will have an entirely different continuation token and thus not read from cache at all.

The only situation where an issue might occur is when reconfiguring cache limits while it is reading from cache.
A situation that is purely theoretical, since the cache is only used during the full processing where the console is locked.

What about concurrency messing with paged cache entries?

Theoretically, other runspaces could add new data too fast and - while one runspace is reading cached pages, others could add new entries and expire the next page from the cache.
The continuation token is unlikely to still be valid and the read fails.

Theoretically an issue, in practice exceedingly unlikely.
Reads from cache are far faster than rest requests, which means enough other runspaces would need to add new cache entries in parallel to outpace that.
This would not be an issue in almost any conceivable scale configuration.

resync

astaykov

This is not a stable PR.
Neither Import-Module, nor Connect-ZtAssessment, nor Invoke-ZtAssessment complete successfully. The main issue is a warning:

PS C:\tdb\zerotrustassessment> Import-Module .\src\powershell\ZeroTrustAssessment.psd1 -Force

Resolving 7 dependencies...
    ✅ Module PSFramework found (v1.14.449).
# trimmed for clarity

Asserting MSAL loading order for dependencies...
    ✅ Loading Microsoft.IdentityModel.Abstractions.dll
# trimmed for clarity
WARNING: [12:38:43][ZeroTrustAssessment.psm1] Failed to import file C:\tdb\zerotrustassessment\src\powershell\scripts\variables.ps1 | Cannot convert argument "Lifetime", with value: "PSFramework.Configuration.Config", for "SetLifetime" to type "PSFramework.Parameter.TimeSpanParameter": "Cannot convert value "PSFramework.Configuration.Config" to type "PSFramework.Parameter.TimeSpanParameter". Error: "Could not interpret PSFramework.Configuration.Config""

PS C:\tdb\zerotrustassessment> Connect-ZtAssessment -TenantId abc.com
🔑 Authentication to Graph, Azure, AipService, ExchangeOnline, SecurityCompliance, SharePointOnline.
During the next steps, you may be prompted to authenticate separately for several services.

Connecting to Microsoft Graph
   ✅ Connected
WARNING: [12:41:38][Connect-ZtAssessment] Could not resolve the initial tenant domain from Graph. EXO, S&C, and SPO CBA connections may fail.
WARNING: [12:41:38][Test-ZtContext] Unable to verify user roles | Cannot index into a null array.

Ref. automated validation run (but I also tested this on a clean VM): https://github.com/microsoft/zerotrustassessment/actions/runs/27681160160/job/81868692154

FriedrichWeinmann · 2026-06-17T12:42:03Z

Thanks for catching that! Fixed.

Copilot

Pull request overview

This PR reworks in-memory persistence and parallel test execution to reduce memory pressure in large tenants, leveraging newer PSFramework runspace workflow features (timeouts/retries) and moving request caching to PSFramework cache primitives.

Changes:

Raised the required PSFramework version and migrated Graph/Azure request caching to PSFramework cache objects with configurable max-items and lifetime.
Updated parallel test execution to use PSFramework runspace workflow timeouts (including timeout type) and added retry support for failed tests.
Added memory controls around test statistics message retention and refactored per-test completion/error handling into shared helpers.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/powershell/ZeroTrustAssessment.psd1	Bumps required PSFramework version to enable new runspace workflow/cache features.
src/powershell/scripts/variables.ps1	Switches Graph/Azure caches to PSF cache objects and applies configured cache limits.
src/powershell/scripts/configuration.ps1	Adds config surface for retries, timeout type, stats message limits, and cache limits.
src/powershell/public/Invoke-ZtGraphRequest.ps1	Simplifies paging loop nextLink detection.
src/powershell/public/Invoke-ZtAzureRequest.ps1	Simplifies Az REST parameter forwarding via `ConvertTo-PSFHashtable`.
src/powershell/public/Invoke-ZtAssessment.ps1	Moves timeouts to `[PSFTimeSpan]` and improves config file validation behavior.
src/powershell/private/tests/Write-ZtTestFinish.ps1	New helper to finalize test statistics, progress updates, and per-test logs with message soft-limit.
src/powershell/private/tests/Write-ZtTestError.ps1	New helper to standardize error handling/reporting for failed tests.
src/powershell/private/tests/Start-ZtTestExecution.ps1	Reworks parallel test worker to use workflow timeout/retry features.
src/powershell/private/tests/Invoke-ZtTests.ps1	Updates test timeout type, passes timespans through, and improves workflow cleanup handling.
src/powershell/private/tests/Invoke-ZtTestParallel.ps1	New parallel-runspace test executor that integrates with shared result/statistics handling.
src/powershell/private/tests/Invoke-ZtTest.ps1	Refactors end-of-test handling to the new finish helper.
src/powershell/private/tests/Get-ZtTestResult.ps1	New helper to persist a per-test result object across retry attempts.
src/powershell/private/export/Export-ZtTenantData.ps1	Adds `Debug.KeepWorkflows` guard to keep/remove export workflows.
src/powershell/private/core/Invoke-ZtGraphRequestCache.ps1	Avoids “Using graph cache” debug log unless a cached value is actually returned.

FriedrichWeinmann and others added 2 commits June 15, 2026 11:27

resync

4199875

resync

Caching, Timeouts & Memory Tweaks

2f8d98d

astaykov requested changes Jun 17, 2026

View reviewed changes

Fixing config value retrieval

d77aa2e

aahmed-spec mentioned this pull request Jun 18, 2026

Zero Trust Assessment failing with latest Pre-release module 2.2.42 (#1154) #1198

Open

Friedrich Weinmann added 2 commits June 19, 2026 14:54

Improved debugging & fixing background test processing

2f2e21c

Improved failed test reporting

30c990a

astaykov requested a review from Copilot June 19, 2026 13:57

Copilot started reviewing on behalf of astaykov June 19, 2026 13:58 View session

Copilot AI reviewed Jun 19, 2026

View reviewed changes

Friedrich Weinmann added 3 commits June 19, 2026 16:27

Fixed retry condition and comments

73afbfa

raised dependency version

9ab21b9

Removed new stacktrace handling until fixed

84dc824

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching Rework, Timeouts & Memory Improvements#1317

Caching Rework, Timeouts & Memory Improvements#1317
FriedrichWeinmann wants to merge 8 commits into
microsoft:devfrom
FriedrichWeinmann:dev-hang

FriedrichWeinmann commented Jun 16, 2026

Uh oh!

astaykov left a comment

Uh oh!

FriedrichWeinmann commented Jun 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

FriedrichWeinmann commented Jun 16, 2026

Changes

Test Timeout update

Retry Failed Tests

Migrated Caches to PSFCache

Added soft-limit for Test Statistics messages

Warning: Untested

Thoughts on Cache Changes & Paged Requests

Uh oh!

astaykov left a comment

Choose a reason for hiding this comment

Uh oh!

FriedrichWeinmann commented Jun 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants