feat: Lakebase deployment, auth improvements, and PBI model routing by MrBlack1995 · Pull Request #51 · databrickslabs/kasal

MrBlack1995 · 2026-04-16T15:07:25Z

Summary

Lakebase deployment and authentication improvements
PowerBI semantic model tools (fetcher, metadata reducer, DAX executor)
PowerBI analysis tooling (relationships, hierarchies, field parameters, report references)
PowerBI connector, authentication, and DAX-to-SQL conversion services
PowerBI router, config models, and related migrations
Flow execution and run shell adaptations

Reopens work from #47 which was closed without merging.

This pull request was AI-assisted by Isaac.

Resolved conflict in main.py: kept colleague's ASGI class-based LocalDevAuthMiddleware, preserving settings.LOCAL_DEV_USER_EMAIL fallback instead of hardcoded admin@admin.com. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ce docs Phase 4 — Runtime output scanning & excessive agency: - Secret leak detection (10 credential pattern families) in agent output - Flow trust boundary scanning between crews in multi-crew flows - Memory poisoning defense (scan task output before persistence) - Tool output scanning in step callbacks - Excessive agency detection (PERFORMS_DESTRUCTIVE_OPERATIONS flag) Phase 5 — Optimizations: - Unified SecurityScannerPipeline singleton with audit logging - False-positive reduction (tightened MEDIUM regex patterns) - LLM guardrail SHA-256 LRU caching (skip redundant calls on retries) - Secret detector expansion (GitHub, GCP, Azure, DSA/encrypted PEM) Documentation: - Updated README_SECURITY_COMPLIANCE.md with Areas 9-16 and overdelivery table - Updated README_SECURITY_GUARDRAILS_TESTGUIDE.md to cover Phases 1-5 Tests: ~107 new tests across 7 test files (252 total security tests) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace realistic-looking fake tokens in test files and docs with obviously-fake placeholders that won't trigger GitHub/GitGuardian secret scanning. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds docs/examples/ folder with 4 import-ready JSON definitions for the Power BI → UC Metric View migration pipeline: - crew_ucmv_pipeline_config_generator.json (Crew 1 — PBI metadata extraction) - crew_uc_metric_view_generator.json (Crew 2 — DAX → Spark SQL translation) - crew_ucmv_quality_validator.json (Crew 3 — measure validation) - flow_ucmv_plus_validation.json (full 3-crew flow) All live credentials (tenant_id, client_id, client_secret, PATs) have been replaced with <YOUR_…> placeholders. Updates docs/README.md to surface the new examples section and UCMV pipeline guides. Co-authored-by: Isaac

…ichment guide Adds a complete natural language Q&A case study for Power BI semantic models: - docs/powerbi/powerbi-analytics-qa-case-study.md — full guide covering the 3-agent crew architecture (Fetcher → Reducer → DAX Generator), all 6 context enrichment fields (business_mappings, field_synonyms, active_filters, context_knowledge, reference_dax, visible_tables), CGR/Italy worked example, and troubleshooting table - docs/examples/crew_pbi_analyst_qa.json — import-ready crew JSON with all live credentials replaced by <YOUR_…> placeholders - Updates docs/powerbi/README.md to surface the case study as the primary analytics entry point - Updates docs/README.md to list the new analyst crew in the examples table Co-authored-by: Isaac

- Adds '⭐ Analytics Q&A — Case Study' as first entry in the 'Power BI - Analytics / Q&A' nav section in Documentation.tsx - Copies powerbi-analytics-qa-case-study.md to public/docs/powerbi/ so the frontend can serve it from /docs - Syncs updated powerbi/README.md to public/docs/powerbi/ Co-authored-by: Isaac

…UMNS for average questions Rule 4 in the DAX generation prompt was hardcoded as 'Use EVALUATE + SUMMARIZECOLUMNS for all queries' — causing the LLM to always produce a grouped table even when the user asked for a single average value. Adds two new sub-rules: - 4a: AVERAGE PER ENTITY pattern using AVERAGEX over SUMMARIZECOLUMNS (e.g. 'average sales per customer' → AVERAGEX(SUMMARIZECOLUMNS(...), ...)) - 4b: SIMPLE AVERAGE pattern using CALCULATE(AVERAGE(...)) or AVERAGEX(Table, col) Keeps SUMMARIZECOLUMNS as the default for grouping/breakdown queries. Co-authored-by: Isaac

Power BI semantic model metadata rarely changes — refreshing daily was unnecessarily expensive. Changes: - PowerBISemanticModelCache.CACHE_TTL_DAYS = 7 (class constant) - is_valid_for_today() now accepts entries up to 7 days old - Repository get_cache_for_today() queries cached_date >= (today - 7d) instead of cached_date == today, returning the most recent entry in the window via ORDER BY cached_date DESC - delete_old_caches() default days_to_keep=7 already aligned (no change) To force a re-fetch before the week is up, delete the row from powerbi_semantic_model_cache for the relevant dataset_id. Co-authored-by: Isaac

…day) Replaces the hardcoded weekly TTL with a per-tool configurable field 'cache_ttl_days' on the Semantic Model Fetcher (Tool 79). Changes: - Fetcher schema: adds cache_ttl_days (int, default 1) - Fetcher pipeline: passes config['cache_ttl_days'] to cache service - CacheService.get_cached_metadata(): accepts cache_ttl_days param - Repository.get_cache_for_today(): accepts cache_ttl_days, uses cutoff = today - (ttl_days - 1) so ttl=1 means today only - Model.is_valid_for_today(): accepts ttl_days param - tools.py seed: adds cache_ttl_days: 1 to Tool 79 default config so it appears as an editable field in the Kasal UI tool config panel Default is 1 (daily, original behaviour). Set to 7 for weekly refresh. Co-authored-by: Isaac

Adds a 'Cache TTL (days)' Select field to the Output Options section of PowerBIFetcherConfigSelector. Options: 1 (daily), 3, 7 (weekly), 14, 30 (monthly). Default is 1. Value is passed as cache_ttl_days to the backend tool config. Co-authored-by: Isaac

CLAassistant · 2026-05-26T13:09:00Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

gitguardian · 2026-05-28T13:52:09Z

⚠️ GitGuardian has uncovered 6 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

Since your pull request originates from a forked repository, GitGuardian is not able to associate the secrets uncovered with secret incidents on your GitGuardian dashboard.
Skipping this check run and merging your pull request will create secret incidents on your GitGuardian dashboard.

🔎 Detected hardcoded secrets in your pull request

GitGuardian id	GitGuardian status	Secret	Commit	Filename
-	-	Generic High Entropy Secret	`3ccf00f`	src/backend/tests/unit/converters/services/uc_metrics/test_authentication.py	View secret
-	-	Bearer Token	`b363491`	src/backend/tests/unit/utils/test_sensitive_data_utils.py	View secret
29365397	Triggered	Databricks Authentication Token	`94942c3`	src/docs/README_SECURITY_GUARDRAILS_TESTGUIDE.md	View secret
-	-	Generic CLI Secret	`1012f7d`	examples/uc_metric_view_migration/deploy_test.py	View secret
-	-	Generic Password	`b363491`	src/backend/tests/unit/utils/test_sensitive_data_utils.py	View secret
27951017	Triggered	Generic High Entropy Secret	`7f67566`	src/backend/tests/unit/engines/crewai/config/test_embedder_endpoint_fallback.py	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secrets safely. Learn here the best practices.
Revoke and rotate these secrets.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}

Add # nosec annotations to mock JWT tokens and Bearer token fixtures used in unit tests. These are test-only placeholder values with no real credentials: - eyJoaWdoLnRva2Vu = base64("high.token") — synthetic test token - eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ1c2VyIn0.abc123 — standard JWT fixture - Bearer eyJhbGciOiJIUzI1NiJ9.payload.sig — redaction test fixture No real secrets were ever committed. Annotations tell secret scanners these are intentional test values. Co-authored-by: Mr. Black 1995 <davidschwarzbusiness1995@gmail.com>

MrBlack1995 and others added 30 commits February 26, 2026 17:14

Xforard authentication

612e757

PBI supporting alternative routes than fabric for model fetching

9da8e55

Lakebase deployment setup

7eba747

App base setup

c54a3bc

Vector Search fix

a1e5d64

Rollback Lakebase

fea03bd

Claude update

6a54109

Lakebase readme modificaiton

f14af2c

Merge origin/feature/flow into feature/flow

d287e58

Resolved conflict in main.py: kept colleague's ASGI class-based LocalDevAuthMiddleware, preserving settings.LOCAL_DEV_USER_EMAIL fallback instead of hardcoded admin@admin.com. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

IT security guardrales compliance setup

7461d34

Adding gaps for future work

2966f42

Final security and compliance testing instructions

825acb8

Merge remote-tracking branch 'origin/feature/flow' into feature/flow

7f67566

fix: replace fake test tokens to pass GitGuardian secret scanning

5e10232

Replace realistic-looking fake tokens in test files and docs with obviously-fake placeholders that won't trigger GitHub/GitGuardian secret scanning. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implementation of DAX generation stepwise

68386a8

Sample data chaching

bf0e343

adding semantic parsing & dax generation separatley

8905bed

Adding slicer extraction

370c965

Adding slicer parsing

b49d9ce

Filtertype cleanup

7776bac

Metadata reducer

f051eff

PBI Query generation adaptation batch 1

9976d19

query generation instructions

8d8459c

Treating filter passing properly

f653415

Cache optimizer

509f32f

Metadata reducer input form adoption

e1e161e

Adding Zustand parameter for deployment

619b31d

Cache retrieval fix

b14a8de

Adding dynamic input variables to input taskforms & logging checks

8d24c1a

MrBlack1995 added 11 commits May 19, 2026 17:43

Take editted files

632c5ca

Base setup Genie space creator

f6c3fc7

Genie tool error fix main flow

fc8f706

genie spaces creation link

c0e8f90

MrBlack1995 added 12 commits May 26, 2026 15:55

Config upload

5f8d4f3

Search field for tables

aa1b5c9

Auto removal of non upload spaces

a2ac108

Deployment fix cache_ttl_days

2c7a575

Constant fix number formatting

440a4a1

Base setup UCMV deployer

e502d0f

UCMV deployment flow

458e43d

Deployer pipeline and approval flow

5c79bb0

Deployment

7220a77

Genei space config generator

ef3ec80

Genie space deployer

8fb3ff3

Genie space generator based upon configs

e14c386

MrBlack1995 added 5 commits May 28, 2026 16:21

Base setup PBI report lakeview dashboard creator

9210682

Base setup lakeflow dashboard desginer

dff6407

Fix visual display and JSON ingestion

7cdb858

Lakeflow Dasbhaord generation

2b3dd32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Lakebase deployment, auth improvements, and PBI model routing#51

feat: Lakebase deployment, auth improvements, and PBI model routing#51
MrBlack1995 wants to merge 162 commits into
databrickslabs:mainfrom
MrBlack1995:feature/flow

MrBlack1995 commented Apr 16, 2026

Uh oh!

CLAassistant commented May 26, 2026

Uh oh!

gitguardian Bot commented May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MrBlack1995 commented Apr 16, 2026

Summary

Uh oh!

CLAassistant commented May 26, 2026

Uh oh!

gitguardian Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ GitGuardian has uncovered 6 secrets following the scan of your pull request.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gitguardian Bot commented May 28, 2026 •

edited

Loading