Complete guide for configuring PullData for different use cases, especially when using the Web UI or REST API.
Before starting the server, edit configs/default.yaml:
# Open and edit the config
notepad configs/default.yaml # Windows
nano configs/default.yaml # Linux/Mac
# Then start the server
python run_server.pyAll projects will use this config by default.
Create separate config files for different setups:
# Copy default config
cp configs/default.yaml configs/lm_studio.yaml
# Edit for LM Studio API
notepad configs/lm_studio.yaml
# Select it in the Web UI when creating projectSet environment variables in .env:
# LM Studio
LM_STUDIO_BASE_URL=http://localhost:1234/v1
LM_STUDIO_API_KEY=sk-dummyThen reference in your config with ${LM_STUDIO_BASE_URL}.
The Web UI (http://localhost:8000/ui/) now supports config selection:
- Select your project
- Choose Configuration from dropdown (or leave as "Default")
- Upload files
- Click "Ingest Documents"
- Select your project
- Choose Configuration to override (optional)
- Enter query
- Click "Query"
The Web UI automatically discovers all .yaml files in the configs/ directory.
curl http://localhost:8000/configsResponse:
{
"configs": [
{
"name": "default",
"path": "configs/default.yaml",
"filename": "default.yaml"
},
{
"name": "lm_studio",
"path": "configs/lm_studio.yaml",
"filename": "lm_studio.yaml"
}
],
"count": 2
}curl -X POST "http://localhost:8000/ingest/upload?project=my_project&config_path=configs/lm_studio.yaml" \
-F "files=@document.pdf"curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{
"project": "my_project",
"query": "What are the key findings?",
"config_path": "configs/lm_studio.yaml",
"output_format": "excel"
}'Create configs/lm_studio.yaml:
models:
embedder:
provider: api
api:
base_url: http://localhost:1234/v1
model: nomic-embed-text-v1.5
api_key: sk-dummy
llm:
provider: api
api:
base_url: http://localhost:1234/v1
model: qwen2.5-3b-instruct
api_key: sk-dummy
temperature: 0.7
storage:
backend: local
local:
sqlite_path: ./data/pulldata.db
faiss_index_path: ./data/faiss_indexesSteps:
- Install and start LM Studio
- Load models in LM Studio
- Start LM Studio server
- Select this config in Web UI
Create configs/openai.yaml:
models:
embedder:
provider: api
api:
base_url: https://api.openai.com/v1
model: text-embedding-3-small
api_key: ${OPENAI_API_KEY} # From .env file
llm:
provider: api
api:
base_url: https://api.openai.com/v1
model: gpt-3.5-turbo
api_key: ${OPENAI_API_KEY}
temperature: 0.7
storage:
backend: local
local:
sqlite_path: ./data/pulldata.db
faiss_index_path: ./data/faiss_indexesPrerequisites:
- Add
OPENAI_API_KEYto.env - Select this config in Web UI
Create configs/local_gpu.yaml:
models:
embedder:
provider: local
local:
model_name: BAAI/bge-small-en-v1.5
device: cuda
llm:
provider: local
local:
model_name: Qwen/Qwen2.5-3B-Instruct
device: cuda
load_in_8bit: true
max_new_tokens: 512
temperature: 0.7
storage:
backend: local
local:
sqlite_path: ./data/pulldata.db
faiss_index_path: ./data/faiss_indexesCreate configs/ollama.yaml:
models:
embedder:
provider: api
api:
base_url: http://localhost:11434/v1
model: nomic-embed-text
api_key: sk-dummy
llm:
provider: api
api:
base_url: http://localhost:11434/v1
model: llama3.2
api_key: sk-dummy
storage:
backend: localPrerequisites:
- Install Ollama: https://ollama.ai
- Pull models:
ollama pull nomic-embed-textandollama pull llama3.2 - Ollama runs automatically on port 11434
Create configs/groq.yaml:
models:
embedder:
provider: api
api:
base_url: https://api.openai.com/v1 # Use OpenAI for embeddings
model: text-embedding-3-small
api_key: ${OPENAI_API_KEY}
llm:
provider: api
api:
base_url: https://api.groq.com/openai/v1
model: llama-3.1-8b-instant
api_key: ${GROQ_API_KEY}
temperature: 0.5
storage:
backend: localPrerequisites:
- Get API keys from https://console.groq.com
- Add to
.env:GROQ_API_KEY=gsk_your_key
# Embedding & LLM Models
models:
embedder:
provider: api # or 'local'
api:
base_url: http://localhost:1234/v1
model: nomic-embed-text-v1.5
api_key: sk-dummy
timeout: 30
max_retries: 3
local:
model_name: BAAI/bge-small-en-v1.5
device: cuda
batch_size: 32
llm:
provider: api # or 'local'
api:
base_url: http://localhost:1234/v1
model: qwen2.5-3b-instruct
api_key: sk-dummy
temperature: 0.7
max_tokens: 512
timeout: 60
local:
model_name: Qwen/Qwen2.5-3B-Instruct
device: cuda
load_in_8bit: true
max_new_tokens: 512
temperature: 0.7
# Storage Backend
storage:
backend: local # or 'postgres', 'chromadb'
local:
sqlite_path: ./data/pulldata.db
faiss_index_path: ./data/faiss_indexes
postgres:
host: localhost
port: 5432
database: pulldata
user: pulldata_user
password: ${POSTGRES_PASSWORD}
chromadb:
persist_directory: ./data/chroma_db
# Document Processing
parsing:
chunk_size: 512
chunk_overlap: 50
min_chunk_size: 100
# Retrieval
retrieval:
top_k: 5
similarity_threshold: 0.0
use_reranking: false
# Output Formats
output:
excel:
auto_format: true
freeze_panes: true
markdown:
include_toc: true
pdf:
page_size: A4
# Caching
cache:
enabled: true
llm_cache_enabled: true
embedding_cache_enabled: true
cache_dir: ./data/cache
# Performance
performance:
max_workers: 4
batch_size: 32
enable_gpu: truefrom pulldata import PullData
# Option 1: Specify config path
pd = PullData(
project="my_project",
config_path="configs/lm_studio.yaml"
)
# Option 2: Use default config
pd = PullData(project="my_project")
# Option 3: Override specific settings
from pulldata.core.config import PullDataConfig
config = PullDataConfig.from_yaml("configs/default.yaml")
config.models.llm.api.temperature = 0.9
pd = PullData(project="my_project", config=config)Error: Config file not found: configs/my_config.yaml
Solution:
- Check file exists:
ls configs/ - Use correct path (relative to project root)
- File must have
.yamlextension
Error: Connection refused to http://localhost:1234/v1
Solution:
- Verify server is running (LM Studio, Ollama, etc.)
- Check port number is correct
- Check firewall settings
- Try:
curl http://localhost:1234/v1/models
Error: Model 'my-model' not found
Solution:
- For LM Studio: Load model in LM Studio UI first
- For Ollama: Run
ollama pull model-name - For local: Model will auto-download from Hugging Face
Error: ${OPENAI_API_KEY} appears literally in error
Solution:
- Check
.envfile exists in project root - Add variable:
OPENAI_API_KEY=sk-your-key - Restart server after changing
.env - Don't use quotes:
OPENAI_API_KEY=sk-key(not"sk-key")
configs/
├── default.yaml # Local models (development)
├── lm_studio.yaml # LM Studio API
├── production.yaml # Production settings
├── openai.yaml # OpenAI API
└── experiments/ # Experimental configs
├── high_temp.yaml
└── long_context.yaml
Never commit API keys to git!
# ✅ Good - uses .env
api_key: ${OPENAI_API_KEY}
# ❌ Bad - hardcoded key
api_key: sk-proj-abc123...Add comments to explain choices:
models:
llm:
api:
temperature: 0.3 # Lower for factual extraction
max_tokens: 1024 # Enough for detailed answersgit add configs/
git commit -m "Add LM Studio config for API usage"# Test with verify_install.py
python verify_install.py
# Test config loading
python -c "from pulldata.core.config import PullDataConfig; PullDataConfig.from_yaml('configs/my_config.yaml')"- Web UI Guide - Using the Web interface
- API Configuration - API provider details
- Features Status - What's implemented
Last Updated: 2024-12-18