feat: Lakebase deployment, auth improvements, and PBI model routing#51
feat: Lakebase deployment, auth improvements, and PBI model routing#51MrBlack1995 wants to merge 162 commits into
Conversation
Resolved conflict in main.py: kept colleague's ASGI class-based LocalDevAuthMiddleware, preserving settings.LOCAL_DEV_USER_EMAIL fallback instead of hardcoded admin@admin.com. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ce docs Phase 4 — Runtime output scanning & excessive agency: - Secret leak detection (10 credential pattern families) in agent output - Flow trust boundary scanning between crews in multi-crew flows - Memory poisoning defense (scan task output before persistence) - Tool output scanning in step callbacks - Excessive agency detection (PERFORMS_DESTRUCTIVE_OPERATIONS flag) Phase 5 — Optimizations: - Unified SecurityScannerPipeline singleton with audit logging - False-positive reduction (tightened MEDIUM regex patterns) - LLM guardrail SHA-256 LRU caching (skip redundant calls on retries) - Secret detector expansion (GitHub, GCP, Azure, DSA/encrypted PEM) Documentation: - Updated README_SECURITY_COMPLIANCE.md with Areas 9-16 and overdelivery table - Updated README_SECURITY_GUARDRAILS_TESTGUIDE.md to cover Phases 1-5 Tests: ~107 new tests across 7 test files (252 total security tests) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace realistic-looking fake tokens in test files and docs with obviously-fake placeholders that won't trigger GitHub/GitGuardian secret scanning. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds docs/examples/ folder with 4 import-ready JSON definitions for the Power BI → UC Metric View migration pipeline: - crew_ucmv_pipeline_config_generator.json (Crew 1 — PBI metadata extraction) - crew_uc_metric_view_generator.json (Crew 2 — DAX → Spark SQL translation) - crew_ucmv_quality_validator.json (Crew 3 — measure validation) - flow_ucmv_plus_validation.json (full 3-crew flow) All live credentials (tenant_id, client_id, client_secret, PATs) have been replaced with <YOUR_…> placeholders. Updates docs/README.md to surface the new examples section and UCMV pipeline guides. Co-authored-by: Isaac
…ichment guide Adds a complete natural language Q&A case study for Power BI semantic models: - docs/powerbi/powerbi-analytics-qa-case-study.md — full guide covering the 3-agent crew architecture (Fetcher → Reducer → DAX Generator), all 6 context enrichment fields (business_mappings, field_synonyms, active_filters, context_knowledge, reference_dax, visible_tables), CGR/Italy worked example, and troubleshooting table - docs/examples/crew_pbi_analyst_qa.json — import-ready crew JSON with all live credentials replaced by <YOUR_…> placeholders - Updates docs/powerbi/README.md to surface the case study as the primary analytics entry point - Updates docs/README.md to list the new analyst crew in the examples table Co-authored-by: Isaac
- Adds '⭐ Analytics Q&A — Case Study' as first entry in the 'Power BI - Analytics / Q&A' nav section in Documentation.tsx - Copies powerbi-analytics-qa-case-study.md to public/docs/powerbi/ so the frontend can serve it from /docs - Syncs updated powerbi/README.md to public/docs/powerbi/ Co-authored-by: Isaac
…UMNS for average questions Rule 4 in the DAX generation prompt was hardcoded as 'Use EVALUATE + SUMMARIZECOLUMNS for all queries' — causing the LLM to always produce a grouped table even when the user asked for a single average value. Adds two new sub-rules: - 4a: AVERAGE PER ENTITY pattern using AVERAGEX over SUMMARIZECOLUMNS (e.g. 'average sales per customer' → AVERAGEX(SUMMARIZECOLUMNS(...), ...)) - 4b: SIMPLE AVERAGE pattern using CALCULATE(AVERAGE(...)) or AVERAGEX(Table, col) Keeps SUMMARIZECOLUMNS as the default for grouping/breakdown queries. Co-authored-by: Isaac
Power BI semantic model metadata rarely changes — refreshing daily was unnecessarily expensive. Changes: - PowerBISemanticModelCache.CACHE_TTL_DAYS = 7 (class constant) - is_valid_for_today() now accepts entries up to 7 days old - Repository get_cache_for_today() queries cached_date >= (today - 7d) instead of cached_date == today, returning the most recent entry in the window via ORDER BY cached_date DESC - delete_old_caches() default days_to_keep=7 already aligned (no change) To force a re-fetch before the week is up, delete the row from powerbi_semantic_model_cache for the relevant dataset_id. Co-authored-by: Isaac
…day) Replaces the hardcoded weekly TTL with a per-tool configurable field 'cache_ttl_days' on the Semantic Model Fetcher (Tool 79). Changes: - Fetcher schema: adds cache_ttl_days (int, default 1) - Fetcher pipeline: passes config['cache_ttl_days'] to cache service - CacheService.get_cached_metadata(): accepts cache_ttl_days param - Repository.get_cache_for_today(): accepts cache_ttl_days, uses cutoff = today - (ttl_days - 1) so ttl=1 means today only - Model.is_valid_for_today(): accepts ttl_days param - tools.py seed: adds cache_ttl_days: 1 to Tool 79 default config so it appears as an editable field in the Kasal UI tool config panel Default is 1 (daily, original behaviour). Set to 7 for weekly refresh. Co-authored-by: Isaac
Adds a 'Cache TTL (days)' Select field to the Output Options section of PowerBIFetcherConfigSelector. Options: 1 (daily), 3, 7 (weekly), 14, 30 (monthly). Default is 1. Value is passed as cache_ttl_days to the backend tool config. Co-authored-by: Isaac
|
|
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| - | - | Generic High Entropy Secret | 3ccf00f | src/backend/tests/unit/converters/services/uc_metrics/test_authentication.py | View secret |
| - | - | Bearer Token | b363491 | src/backend/tests/unit/utils/test_sensitive_data_utils.py | View secret |
| 29365397 | Triggered | Databricks Authentication Token | 94942c3 | src/docs/README_SECURITY_GUARDRAILS_TESTGUIDE.md | View secret |
| - | - | Generic CLI Secret | 1012f7d | examples/uc_metric_view_migration/deploy_test.py | View secret |
| - | - | Generic Password | b363491 | src/backend/tests/unit/utils/test_sensitive_data_utils.py | View secret |
| 27951017 | Triggered | Generic High Entropy Secret | 7f67566 | src/backend/tests/unit/engines/crewai/config/test_embedder_endpoint_fallback.py | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Add # nosec annotations to mock JWT tokens and Bearer token fixtures
used in unit tests. These are test-only placeholder values with no
real credentials:
- eyJoaWdoLnRva2Vu = base64("high.token") — synthetic test token
- eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ1c2VyIn0.abc123 — standard JWT fixture
- Bearer eyJhbGciOiJIUzI1NiJ9.payload.sig — redaction test fixture
No real secrets were ever committed. Annotations tell secret scanners
these are intentional test values.
Co-authored-by: Mr. Black 1995 <davidschwarzbusiness1995@gmail.com>
Summary
Reopens work from #47 which was closed without merging.
This pull request was AI-assisted by Isaac.