Skip to content

fix: don't honor result_as_answer when tool execution errors#5157

Open
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
devin/1774716145-fix-result-as-answer-on-error
Open

fix: don't honor result_as_answer when tool execution errors#5157
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
devin/1774716145-fix-result-as-answer-on-error

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot commented Mar 28, 2026

Summary

Fixes #5156. When a tool with result_as_answer=True raises an exception, the error message was being treated as the agent's final answer, preventing the agent from reflecting on the failure and retrying.

The fix adds error tracking across all tool execution code paths so that result_as_answer is only honored on successful tool executions:

  • tool_usage.py: Added _last_execution_errored flag, set in all error branches (ToolUsageError, tool selection failure, runtime exception in _use/_ause)
  • tool_utils.py: Both execute_tool_and_check_finality and aexecute_tool_and_check_finality check the flag before returning result_as_answer=True
  • crew_agent_executor.py: Propagates error_occurred through execution result dict; _append_tool_result_and_check_finality gates on it
  • agent_utils.py: Uses existing error_event_emitted to gate result_as_answer
  • experimental/agent_executor.py: Same pattern applied to sequential loop, parallel results loop, and parallel error fallback

Review & Testing Checklist for Human

  • Verify step_executor.py coverage: This file was not modified. Confirm that its native tool path delegates to one of the fixed executors and doesn't have its own independent result_as_answer check that bypasses the fix.
  • Verify _last_execution_errored reliability: The flag is a mutable instance attribute on ToolUsage, reset at the top of use()/ause() and read immediately after by tool_utils.py. Confirm no intermediate call can reset it before it's read.
  • End-to-end test: Create an agent with a result_as_answer=True tool that intentionally fails, and confirm the agent continues reasoning rather than returning the error as its final answer.
  • Verify parallel execution path in experimental executor: The parallel error fallback sets "original_tool": None alongside "error_occurred": True — the result_as_answer guard is technically unreachable here since original_tool is falsy. Confirm this is acceptable.

Notes

  • Six new unit tests were added covering the ToolUsage flag, execute_tool_and_check_finality (both error and success), and native tool execution in AgentExecutor (both error and success).
  • The fix uses two different error-tracking mechanisms depending on the code path: a _last_execution_errored flag on ToolUsage (for text/ReAct pattern), and an error_occurred dict key / error_event_emitted local variable (for native tool calling). This follows existing conventions in each module rather than introducing a new abstraction.

Link to Devin session: https://app.devin.ai/sessions/a7393abd35bf4141bf23fe9e1b86b364


Note

Medium Risk
Changes tool-execution finality logic across multiple executors and hook wrappers; behavior around result_as_answer now depends on new error-tracking flags, which could alter when agents short-circuit after tool calls.

Overview
Prevents tools marked result_as_answer=True from prematurely short-circuiting the agent when the tool execution fails, allowing the model to see the error and continue reasoning/retrying.

This propagates explicit error state through native tool execution results (including parallel paths) in CrewAgentExecutor and the experimental AgentExecutor, and adds _last_execution_errored tracking in ToolUsage so tool_utils.execute_*_tool_and_check_finality only returns result_as_answer on successful runs. Adds regression tests covering both success/error cases for native tool execution and ToolUsage/execute_tool_and_check_finality behavior.

Written by Cursor Bugbot for commit f5dc745. This will update automatically on new commits. Configure here.

When a tool with result_as_answer=True raises an exception, the agent
now continues reasoning about the error instead of treating the error
message as the final answer.

Fixes #5156

Co-Authored-By: João <joao@crewai.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Prompt hidden (unlisted session)

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment thread lib/crewai/src/crewai/tools/tool_usage.py
Address Cursor Bugbot review: the add_image exception handlers in
use() and ause() were missing the error flag, allowing result_as_answer
to be incorrectly honored when those paths errored.

Co-Authored-By: João <joao@crewai.com>
@Vamshi3130
Copy link
Copy Markdown

what counts as tool execution error,
This is my code execution tool, i tried this pr , eventhough the code executed with errors ,the task is marked completed because of result_as_answer

in tools/venv_tool.py

import asyncio
import os
import shutil
from crewai.tools import BaseTool
from pydantic import BaseModel, Field
from typing import Type
import pandas as pd

Use a relative path for better portability

This assumes the sandbox_venv is in the root of your project

SANDBOX_PYTHON_PATH = "/opt/sandbox_venv/bin/python"

class CodeInterpreterInput(BaseModel):
"""Input for the VenvCodeInterpreterTool."""
code: str = Field(...,
description="The Python code to be executed. The df variable containing the dataset is already loaded. Do not write code to load it.")

class VenvCodeInterpreterTool(BaseTool):
name: str = "Sandbox Python Code Interpreter"
description: str = "Executes Python code in a secure, pre-configured sandbox environment. The target dataframe 'df' is automatically loaded into the environment."
args_schema: Type[BaseModel] = CodeInterpreterInput
_session_id: str
_last_known_mtime: float = 0.0

def __init__(self, dataframe: pd.DataFrame):
    """
    Initializes the tool by saving the dataframe to a temporary pickle file
    that the sandbox environment will read.
    """
    super().__init__()
    import uuid
    self._session_id = uuid.uuid4().hex
    # Initial dataframe is saved to local /tmp or a dedicated directory.
    dataframe.to_parquet(self.get_temp_df_path())
    self._last_known_mtime = os.path.getmtime(self.get_temp_df_path())
    print(f"VenvCodeInterpreterTool initialized. DataFrame saved to {self.get_temp_df_path()}")

def get_temp_df_path(self) -> str:
    return f"/tmp/df_{self._session_id}.parquet"

def _get_backup_path(self) -> str:
    return self.get_temp_df_path() + ".bak"

def cleanup(self) -> None:
    """
    Safely deletes the temporary dataframe files (main + backup).
    MUST be called at the end of the analysis flow.
    """
    for path in [self.get_temp_df_path(), self._get_backup_path()]:
        try:
            if os.path.exists(path):
                os.remove(path)
                print(f"Cleaned up: {path}")
        except Exception as e:
            print(f"Warning: Failed to cleanup {path}: {e}")

def update_state_from_base64(self, b64_str: str) -> None:
    """Hydrates the sandboxed parquet file from a base64 encoded string from Flow State."""
    import base64
    if b64_str:
        with open(self.get_temp_df_path(), 'wb') as f:
            f.write(base64.b64decode(b64_str.encode('utf-8')))
        self._last_known_mtime = os.path.getmtime(self.get_temp_df_path())

def get_state_as_base64(self) -> str:
    """Extracts the sandboxed parquet file as a base64 encoded string for Flow State."""
    import base64
    if os.path.exists(self.get_temp_df_path()):
        with open(self.get_temp_df_path(), 'rb') as f:
            return base64.b64encode(f.read()).decode('utf-8')
    return ""

def was_state_modified(self) -> bool:
    """Check if the parquet file was modified since last hydration/init."""
    if not os.path.exists(self.get_temp_df_path()):
        return False
    current_mtime = os.path.getmtime(self.get_temp_df_path())
    return current_mtime > self._last_known_mtime

async def _run(self, code: str) -> str:
    """
    Asynchronously executes the agent's code in a sandboxed venv.
    Automatically injects code to load the DataFrame and save state if modified.
    Backs up the parquet file before execution and restores on failure.
    """
    backup_path = self._get_backup_path()

    # Backup current state before execution (rollback on failure)
    if os.path.exists(self.get_temp_df_path()):
        shutil.copy2(self.get_temp_df_path(), backup_path)

    try:
        # Strip markdown code blocks if the LLM erroneously outputs them
        cleaned_code = code.strip()
        if cleaned_code.startswith("```python"):
            cleaned_code = cleaned_code.replace("```python", "", 1)
        if cleaned_code.startswith("```"):
            cleaned_code = cleaned_code.replace("```", "", 1)
        if cleaned_code.endswith("```"):
            cleaned_code = cleaned_code[:-3] if cleaned_code.endswith("```") else cleaned_code
        cleaned_code = cleaned_code.strip()

        # We cleanly inject the setup and teardown mechanisms for the dataframe state
        injected_prep = f"import pandas as pd\ndf = pd.read_parquet('{self.get_temp_df_path()}')\n\n"
        injected_save = f"\n\n# --- AUTOMATIC STATE SAVE ---\ntry:\n    import pandas as pd\n    if 'df' in dir() and isinstance(df, pd.DataFrame):\n        df.to_parquet('{self.get_temp_df_path()}')\nexcept Exception:\n    pass  # On failure, original parquet file is preserved unchanged\n"
        
        wrapped_code = f"{injected_prep}{cleaned_code}{injected_save}"

        process = await asyncio.create_subprocess_exec(
            SANDBOX_PYTHON_PATH,
            "-c", # Execute the following string as a command
            wrapped_code, # The agent's code, including auto-injected setup/teardown
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.PIPE,
        )

        stdout, stderr = await process.communicate()

        await process.wait()

        if process.returncode != 0:
            # Restore backup on failure — rollback corrupted state
            if os.path.exists(backup_path):
                shutil.copy2(backup_path, self.get_temp_df_path())
            error_output = stderr.decode('utf-8').strip()
            return f"Execution failed:\n{error_output}"
        else:
            return stdout.decode('utf-8').strip()

    except Exception as e:
        # Restore backup on exception
        if os.path.exists(backup_path):
            shutil.copy2(backup_path, self.get_temp_df_path())
        return f"An unexpected error occurred while running the subprocess: {str(e)}"
    finally:
        # Clean up backup after successful completion
        if os.path.exists(backup_path):
            os.remove(backup_path)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

1 participant