A faithful replication of the AutoGluon-Assistant architecture, implemented in LangGraph.
MLauto automates solving ML tasks end-to-end: it understands the data, selects the right library, retrieves relevant tutorials, generates and executes code inside Docker, and uses MCTS to intelligently search the solution space — backtracking out of dead ends.
Phase 1 — Perception
scan_data → find_description_files → generate_task_description → select_tools
→ [Semantic Memory] retrieve_tutorials → [Episodic Memory] rerank_tutorials
Phase 2 — Iterative Coding (MCTS loop)
select_node → expand_node → retrieve_node_tutorials → rerank_node_tutorials
→ generate_python_code → generate_bash_script → execute_and_evaluate
→ backpropagate → (repeat or done)
| Module | Maps to | Role |
|---|---|---|
perception_agent/ |
DataPerceptionAgent etc. |
Understand data & select tools |
semantic_memory/ |
RetrieverAgent |
FAISS + BGE tutorial search |
episodic_memory/ |
RerankerAgent |
LLM-based tutorial selection |
iterativecoding_agent/ |
CodingAgent + NodeManager |
MCTS code generation loop |
shared/ |
Core infrastructure | State, LLM, Node, NodeManager, TutorialIndexer |
MLauto executes all generated code inside an isolated Docker container. You must build the base executor image first:
# Build the docker image (make sure the Docker daemon is running)
docker build -t mlauto-executor:latest .pip install -r requirements.txt
# Set your OpenAI API key
export OPENAI_API_KEY=sk-...The Semantic Memory and Episodic Memory modules run as fully standalone, standard-compliant MCP servers communicating over HTTPS (Server-Sent Events). You must start both servers first (make sure to run these from inside the MLauto directory so Python can resolve the packages):
# In Terminal 1: Start Semantic Memory MCP Server (Port 8010)
cd MLauto
uvicorn semantic_memory.mcp_server:app --port 8010
# In Terminal 2: Start Episodic Memory MCP Server (Port 8011)
cd MLauto
uvicorn episodic_memory.mcp_server:app --port 8011Once both servers are running, you can launch the end-to-end MLauto agent pipeline in a separate terminal:
python run.py /home/administrator/dreamlab/data1 \
-u "Solve the regression problem using the provided data. Output the final submission in a CSV file containing predictions on the test set." \
-o ./my_results1.3 \
-v 4 \
-n 10 Arguments Explained:
/path/to/your/dataset: (Required) The absolute or relative path to your input data folder. This is a positional argument, so it requires no flag.-u/--user-input: (Required) The specific instructions or task description for the ML agent.-v/--verbosity: (Optional) Sets the terminal logging level from0to4. The default is2(INFO). We recommend3(DETAIL) for tracking the MCTS tree progress, and4(DEBUG) for viewing raw LLM prompts.-o/--output: (Optional) The directory where generated code, logs, and state snapshots will be saved. If omitted, it auto-generates a unique folder in./runs.-n/--max-iterations: (Optional) Overrides the maximum MCTS tree search iterations specified inconfig.yaml.-c/--config: (Optional) Path to a custom YAML configuration file.
Edit config.yaml to control:
- LLM model and temperature
- MCTS parameters (iterations, exploration constant, failure penalty)
- Tutorial retrieval (top-k, condensed vs full, max length)
- Docker execution settings