Monitoring and Insights #3102
Replies: 3 comments
-
|
For production monitoring, here's what works: 1. State-Based ObservabilityBuilt-in monitoring via shared state: state = {
"run_id": str(uuid.uuid4()),
"metrics": {
"total_tokens": 0,
"agent_latencies": {},
"tool_calls": [],
"errors": []
},
"model_usage": {}
}
def track_agent(agent_name, tokens, latency, model):
state["metrics"]["total_tokens"] += tokens
state["metrics"]["agent_latencies"][agent_name] = latency
if model not in state["model_usage"]:
state["model_usage"][model] = {"calls": 0, "tokens": 0}
state["model_usage"][model]["calls"] += 1
state["model_usage"][model]["tokens"] += tokens2. Open Source ToolsLangfuse (you mentioned Braintrust): from langfuse import Langfuse
langfuse = Langfuse()
@langfuse.trace()
def run_crew(task):
result = crew.kickoff(inputs={"task": task})
return resultWeights & Biases: import wandb
wandb.init(project="crewai-production")
wandb.log({"tokens": state["metrics"]["total_tokens"]})3. Key Metrics to Track
4. Production InsightThe state-based approach gives you:
More on observability patterns: https://github.com/KeepALifeUS/autonomous-agents |
Beta Was this translation helpful? Give feedback.
-
|
[like] Kiran Jaggu reacted to your message:
…________________________________
From: Vladyslav Shapovalov ***@***.***>
Sent: Thursday, February 12, 2026 8:56:26 PM
To: crewAIInc/crewAI ***@***.***>
Cc: Kiran Jaggu ***@***.***>; Author ***@***.***>
Subject: Re: [crewAIInc/crewAI] Monitoring and Insights (Discussion #3102)
For production monitoring, here's what works:
1. State-Based Observability
Built-in monitoring via shared state:
state = {
"run_id": str(uuid.uuid4()),
"metrics": {
"total_tokens": 0,
"agent_latencies": {},
"tool_calls": [],
"errors": []
},
"model_usage": {}
}
def track_agent(agent_name, tokens, latency, model):
state["metrics"]["total_tokens"] += tokens
state["metrics"]["agent_latencies"][agent_name] = latency
if model not in state["model_usage"]:
state["model_usage"][model] = {"calls": 0, "tokens": 0}
state["model_usage"][model]["calls"] += 1
state["model_usage"][model]["tokens"] += tokens
2. Open Source Tools
Langfuse (you mentioned Braintrust):
from langfuse import Langfuse
langfuse = Langfuse()
@langfuse.trace()
def run_crew(task):
result = crew.kickoff(inputs={"task": task})
return result
Weights & Biases:
import wandb
wandb.init(project="crewai-production")
wandb.log({"tokens": state["metrics"]["total_tokens"]})
3. Key Metrics to Track
* Per-agent latency — which agent is slow?
* Per-model token usage — cost attribution
* Tool call frequency — what's being used?
* Error rates — per agent, per model
4. Production Insight
The state-based approach gives you:
* Built-in audit trail
* No external dependencies
* Easy comparison between runs
More on observability patterns: https://github.com/KeepALifeUS/autonomous-agents
—
Reply to this email directly, view it on GitHub<#3102 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIHGE4BZU6XASB4FFPAH7F34LTSHVAVCNFSM6AAAAACATCW5V6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTKNZYG43TQOA>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
|
Langfuse and W&B are solid for the observability side. The gap I kept running into was: ok I can see which model is underperforming, but then what? I still have to manually go change the routing or swap the model. I built Kalibr to close that loop. It auto-instruments your LLM calls (OpenAI, Anthropic, Google), captures traces, and then uses outcome data you report to shift routing automatically. So if Claude starts degrading on a specific task type, traffic moves to the next best option without you touching anything. Open source Python SDK, might be worth trying alongside Langfuse since they solve different parts of the problem. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm looking for production readiness of my CrewAI agents , which calls for Performance insights on how different models and prompts are performing and secondly monitoring . Any suggestions for opensouce tools ?
I came across https://www.braintrust.dev/ , anyone has success in integrting that with CrewAI , Highly appreciate your response
Beta Was this translation helpful? Give feedback.
All reactions