The Data Workflow Compiler for LLMops
FlowBeast Agent is an intelligent agent that automates and optimizes data workflows — from raw data ingestion to transformation and deployment — acting like a compiler for data engineering tasks.It converts high-level workflow descriptions into executable, efficient pipelines.
- Workflow Compiler — Translates data flow definitions into optimized DAGs (Directed Acyclic Graphs).
- AI-assisted Optimization — Uses AI heuristics to suggest pipeline improvements.
- Multi-backend Support — Integrates with Spark, Airflow, and DVC pipelines.
- Reproducible Builds — Every data transformation is versioned and trackable.
- Declarative DSL — Describe what you want, not how to run it.
- Core Languages: Python, LLMOps
- Agent Frameworks: LangChain / LlamaIndex
- Backend Services: FastAPI, Uvicorn
- Deployment/Containerization: Docker
- Frontend Interaction: VS Code Extension API
- Target Ecosystem: dbt-core, Apache Airflow / Dagster
.
├── archive
├── deployments
├── docker-compose.yml
├── Dockerfile
├── docs
├── flowbeast
├── __init__.py
├── market_material
├── pyproject.toml
├── README.md
├── requirements.txt
├── setup_docker_identity.sh
├── test_data
├── tests
├── uv.lock
└── vs_code_extension
9 directories, 8 files- Environment: Clone the repository and create a Python virtual environment.
- API Key: Configure LLM API Key in the
.envfile. - Run Backend:
docker-compose up(to be implemented) oruvicorn src.main:app --reload - Install Extension: Build
vsc_extensionand install it into local VS Code. - Enjoy! (to be implemented)
git clone https://github.com/ArlesZhang/FlowBeast.git
cd FlowBeast-p1/FlowBeast
pip install -r requirements.txtpython src/main.py --config examples/sample_workflow.yaml- Define DSL for workflow description
- Implement core compiler engine
- Integrate AI optimization agent
- Add Airflow backend
- Release v0.1.0
Arles Zhang
Building AI-powered compiler systems for data engineers. GitHub: @arleszhang
# Core dependencies By GPT5
fastapi==0.115.0
uvicorn==0.30.0
# Data workflow & orchestration
pandas==2.2.3
pydantic==2.9.0
networkx==3.3
# ML & optimization
scikit-learn==1.5.2
# Version control & reproducibility
dvc==3.50.0
# Testing & linting
pytest==8.3.3
black==24.10.0# DataCody Agent Backend Dependencies By Gemini
# Web Framework and Server
fastapi==0.110.0
uvicorn[standard]==0.27.1
# LLM & Agent Framework
langchain==0.1.13 # 或者 llama-index,选择其一
pydantic==2.6.4 # 用于结构化输出 (JSON Schema)
# LLM Provider (假设使用 OpenAI)
openai==1.14.3
# 如果使用其他模型,例如 Claude:
# anthropic==0.23.1
# Data Engineering Tooling (用于解析 dbt 相关文件)
pyyaml==6.0.1
dbt-core==1.7.0 # 用于理解 dbt 的依赖结构和解析器
# 环境和调试
python-dotenv==1.0.1
# ------------------------------
# 可选依赖 (后续迭代时加入)
# ------------------------------
# 数据库/向量库 (用于 RAG 记忆)
# chromadb==0.4.24
# duckdb==0.10.1
# 分布式计算 (Spark集成)
# pyspark==3.5.0
# 文件操作/AST解析
# typed-ast==1.5.5