For localhost development, use Ollama or you can use a remote model server such as vLLM via Model-as-a-Service solution MaaS s
ollama servePull down your needed models. For LLM tool invocations you ofte need a larger model such as Qwen 14B. The way you know is to test your app/agent + model + model-server-configuration.
ollama pull llama3.2:3b
ollama pull qwen3:14b-q8_0cd llama-stack-scriptsexport LLAMA_STACK_BASE_URL=http://localhost:8321
export INFERENCE_MODEL=ollama/llama3.2:3b
# export INFERENCE_MODEL=vllm/llama-4-scout-17b-16e-w4a16
export LLAMA_STACK_LOG_FILE=logs/llama-stack-server.log
export LLAMA_STACK_LOGGING="tools=DEBUG,tool_runtime=DEBUG,providers=DEBUG,server=info"If using Ollama
export OLLAMA_URL=http://localhost:11434 If using MaaS
export VLLM_API_TOKEN=blah
export VLLM_URL=https://llama-4-scout-17b-16e-w4a16-maas-apicast-production.apps.prod.rhoai.rh-aiservices-bu.com:443/v1python3.12 -m venv .venv
source .venv/bin/activate
uv run python -VInstall Dependencies
uv run --with llama-stack llama stack list-deps starter | xargs -L1 uv pip installRun the Llama Stack server attaching itself to ollama
uv run --with llama-stack llama stack run starterInspect the server by running the scripts in llama-stack-scripts
Note: Llama Stack persists state to ~/.llama/
specifically ~/.llama/distributions/starter when using the run starter command above. If you want a clean start:
rm -rf ~/.llama/distributions/starterAssumes Postgres is up and running with appropriate database pre-created. See the deeper dive README.MD
cd fantaco-customer-main
open README.mdRun the Customer REST API
java -jar target/fantaco-customer-main-1.0.0.jarcurl -sS -L "$CUST_URL/api/customers?companyName=Around" | jqcd fantaco-finance-main
open README.mdRun the Finance REST API
java -jar target/fantaco-finance-main-1.0.0.jarcurl -sS -X POST $FIN_URL/api/finance/orders/history \
-H "Content-Type: application/json" \
-d '{
"customerId": "AROUT",
"limit": 10
}' | jqcd fantaco-mcp-servers/customer-mcp
source .venv/bin/activatepython customer-api-mcp-server.pycd fantaco-mcp-servers/finance-mcp
source .venv/bin/activatepython finance-api-mcp-server.pyUsing mcp-inspector to test the MCP Servers
The agents-llama-stack/ directory contains numbered example scripts that demonstrate progressive agent capabilities. Run these scripts in order 1 through 6 to learn the concepts step-by-step:
1_hello_world_agent_no_stream.py- Basic agent without streaming1_hello_world_agent_streaming.py- Basic agent with streaming3_list_customer_tools.py/3_list_finance_tools.py- List available MCP tools4_agent_customer_mcp.py/4_agent_finance_mcp.py- Single-domain agents5_agent_customer_and_finance.py- Multi-domain agent6_multi_turn_agent.py- Multi-turn conversational agent
cd agents-llama-stack
source .venv/bin/activate
python 1_hello_world_agent_no_stream.py
# Continue with 2, 3, 4, 5, 6...Follow the REAME.md in simple-agent-langgraph
