Skip to content

feat: Lakebase deployment, auth improvements, and PBI model routing#51

Open
MrBlack1995 wants to merge 162 commits into
databrickslabs:mainfrom
MrBlack1995:feature/flow
Open

feat: Lakebase deployment, auth improvements, and PBI model routing#51
MrBlack1995 wants to merge 162 commits into
databrickslabs:mainfrom
MrBlack1995:feature/flow

Conversation

@MrBlack1995
Copy link
Copy Markdown
Contributor

Summary

  • Lakebase deployment and authentication improvements
  • PowerBI semantic model tools (fetcher, metadata reducer, DAX executor)
  • PowerBI analysis tooling (relationships, hierarchies, field parameters, report references)
  • PowerBI connector, authentication, and DAX-to-SQL conversion services
  • PowerBI router, config models, and related migrations
  • Flow execution and run shell adaptations

Reopens work from #47 which was closed without merging.

This pull request was AI-assisted by Isaac.

MrBlack1995 and others added 30 commits February 26, 2026 17:14
Resolved conflict in main.py: kept colleague's ASGI class-based
LocalDevAuthMiddleware, preserving settings.LOCAL_DEV_USER_EMAIL
fallback instead of hardcoded admin@admin.com.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ce docs

Phase 4 — Runtime output scanning & excessive agency:
- Secret leak detection (10 credential pattern families) in agent output
- Flow trust boundary scanning between crews in multi-crew flows
- Memory poisoning defense (scan task output before persistence)
- Tool output scanning in step callbacks
- Excessive agency detection (PERFORMS_DESTRUCTIVE_OPERATIONS flag)

Phase 5 — Optimizations:
- Unified SecurityScannerPipeline singleton with audit logging
- False-positive reduction (tightened MEDIUM regex patterns)
- LLM guardrail SHA-256 LRU caching (skip redundant calls on retries)
- Secret detector expansion (GitHub, GCP, Azure, DSA/encrypted PEM)

Documentation:
- Updated README_SECURITY_COMPLIANCE.md with Areas 9-16 and overdelivery table
- Updated README_SECURITY_GUARDRAILS_TESTGUIDE.md to cover Phases 1-5

Tests: ~107 new tests across 7 test files (252 total security tests)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace realistic-looking fake tokens in test files and docs with
obviously-fake placeholders that won't trigger GitHub/GitGuardian
secret scanning.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds docs/examples/ folder with 4 import-ready JSON definitions for
the Power BI → UC Metric View migration pipeline:
- crew_ucmv_pipeline_config_generator.json  (Crew 1 — PBI metadata extraction)
- crew_uc_metric_view_generator.json        (Crew 2 — DAX → Spark SQL translation)
- crew_ucmv_quality_validator.json          (Crew 3 — measure validation)
- flow_ucmv_plus_validation.json            (full 3-crew flow)

All live credentials (tenant_id, client_id, client_secret, PATs) have been
replaced with <YOUR_…> placeholders. Updates docs/README.md to surface
the new examples section and UCMV pipeline guides.

Co-authored-by: Isaac
…ichment guide

Adds a complete natural language Q&A case study for Power BI semantic models:
- docs/powerbi/powerbi-analytics-qa-case-study.md — full guide covering the
  3-agent crew architecture (Fetcher → Reducer → DAX Generator), all 6 context
  enrichment fields (business_mappings, field_synonyms, active_filters,
  context_knowledge, reference_dax, visible_tables), CGR/Italy worked example,
  and troubleshooting table
- docs/examples/crew_pbi_analyst_qa.json — import-ready crew JSON with all
  live credentials replaced by <YOUR_…> placeholders
- Updates docs/powerbi/README.md to surface the case study as the primary
  analytics entry point
- Updates docs/README.md to list the new analyst crew in the examples table

Co-authored-by: Isaac
- Adds '⭐ Analytics Q&A — Case Study' as first entry in the
  'Power BI - Analytics / Q&A' nav section in Documentation.tsx
- Copies powerbi-analytics-qa-case-study.md to public/docs/powerbi/
  so the frontend can serve it from /docs
- Syncs updated powerbi/README.md to public/docs/powerbi/

Co-authored-by: Isaac
…UMNS for average questions

Rule 4 in the DAX generation prompt was hardcoded as 'Use EVALUATE +
SUMMARIZECOLUMNS for all queries' — causing the LLM to always produce
a grouped table even when the user asked for a single average value.

Adds two new sub-rules:
- 4a: AVERAGE PER ENTITY pattern using AVERAGEX over SUMMARIZECOLUMNS
  (e.g. 'average sales per customer' → AVERAGEX(SUMMARIZECOLUMNS(...), ...))
- 4b: SIMPLE AVERAGE pattern using CALCULATE(AVERAGE(...)) or AVERAGEX(Table, col)

Keeps SUMMARIZECOLUMNS as the default for grouping/breakdown queries.

Co-authored-by: Isaac
Power BI semantic model metadata rarely changes — refreshing daily was
unnecessarily expensive. Changes:

- PowerBISemanticModelCache.CACHE_TTL_DAYS = 7 (class constant)
- is_valid_for_today() now accepts entries up to 7 days old
- Repository get_cache_for_today() queries cached_date >= (today - 7d)
  instead of cached_date == today, returning the most recent entry in
  the window via ORDER BY cached_date DESC
- delete_old_caches() default days_to_keep=7 already aligned (no change)

To force a re-fetch before the week is up, delete the row from
powerbi_semantic_model_cache for the relevant dataset_id.

Co-authored-by: Isaac
…day)

Replaces the hardcoded weekly TTL with a per-tool configurable field
'cache_ttl_days' on the Semantic Model Fetcher (Tool 79).

Changes:
- Fetcher schema: adds cache_ttl_days (int, default 1)
- Fetcher pipeline: passes config['cache_ttl_days'] to cache service
- CacheService.get_cached_metadata(): accepts cache_ttl_days param
- Repository.get_cache_for_today(): accepts cache_ttl_days, uses
  cutoff = today - (ttl_days - 1) so ttl=1 means today only
- Model.is_valid_for_today(): accepts ttl_days param
- tools.py seed: adds cache_ttl_days: 1 to Tool 79 default config
  so it appears as an editable field in the Kasal UI tool config panel

Default is 1 (daily, original behaviour). Set to 7 for weekly refresh.

Co-authored-by: Isaac
Adds a 'Cache TTL (days)' Select field to the Output Options section
of PowerBIFetcherConfigSelector. Options: 1 (daily), 3, 7 (weekly),
14, 30 (monthly). Default is 1. Value is passed as cache_ttl_days
to the backend tool config.

Co-authored-by: Isaac
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@gitguardian
Copy link
Copy Markdown

gitguardian Bot commented May 28, 2026

⚠️ GitGuardian has uncovered 6 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

Since your pull request originates from a forked repository, GitGuardian is not able to associate the secrets uncovered with secret incidents on your GitGuardian dashboard.
Skipping this check run and merging your pull request will create secret incidents on your GitGuardian dashboard.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
- - Generic High Entropy Secret 3ccf00f src/backend/tests/unit/converters/services/uc_metrics/test_authentication.py View secret
- - Bearer Token b363491 src/backend/tests/unit/utils/test_sensitive_data_utils.py View secret
29365397 Triggered Databricks Authentication Token 94942c3 src/docs/README_SECURITY_GUARDRAILS_TESTGUIDE.md View secret
- - Generic CLI Secret 1012f7d examples/uc_metric_view_migration/deploy_test.py View secret
- - Generic Password b363491 src/backend/tests/unit/utils/test_sensitive_data_utils.py View secret
27951017 Triggered Generic High Entropy Secret 7f67566 src/backend/tests/unit/engines/crewai/config/test_embedder_endpoint_fallback.py View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Add # nosec annotations to mock JWT tokens and Bearer token fixtures
used in unit tests. These are test-only placeholder values with no
real credentials:

- eyJoaWdoLnRva2Vu   = base64("high.token") — synthetic test token
- eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ1c2VyIn0.abc123 — standard JWT fixture
- Bearer eyJhbGciOiJIUzI1NiJ9.payload.sig — redaction test fixture

No real secrets were ever committed. Annotations tell secret scanners
these are intentional test values.

Co-authored-by: Mr. Black 1995 <davidschwarzbusiness1995@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants