Skip to content

feat(aenv): Integrating aenvironment for environment handling#1018

Open
guozhihao-224 wants to merge 2 commits intoinclusionAI:mainfrom
guozhihao-224:feat/aenv
Open

feat(aenv): Integrating aenvironment for environment handling#1018
guozhihao-224 wants to merge 2 commits intoinclusionAI:mainfrom
guozhihao-224:feat/aenv

Conversation

@guozhihao-224
Copy link
Copy Markdown
Contributor

@guozhihao-224 guozhihao-224 commented Mar 11, 2026

Description

Related Issue

This feature is part of the 2026 Q1 Roadmap (see #907 - "Integrating aenvironment for environment handling").

Summary

Integrate AEnvironment to support MCP-based multi-turn tool calling for RL training.


New Modules

areal/infra/aenv/
├── config.py      # Configuration: aenv URL, timeout, retry, error policy
├── adapter.py     # Wrapper for aenv SDK (initialize, list_tools, call_tool, release)
├── schema.py      # MCP → OpenAI tool format conversion
└── __init__.py

areal/workflow/aenv/
├── workflow.py    # Multi-turn workflow with tool execution
└── __init__.py

examples/aenv/
└── train_aenv.py  # Training entrypoint

tests/
├── test_aenv_adapter.py
├── test_aenv_config.py
├── test_aenv_workflow.py
├── test_aenv_schema.py
└── integration/test_aenv_integration.py

Key Components

Component Description
AenvConfig Configuration with validation: URL, timeout, retry policy, tool error handling
AenvEnvironmentAdapter Wraps aenv SDK with caching, retry logic, resource management
AenvWorkflow Multi-turn RolloutWorkflow that executes model-generated tool calls
normalize_openai_tools() Converts MCP tool schema to OpenAI function-calling format
parse_tool_arguments() Safe JSON argument parsing

Usage

from areal import PPOTrainer
from areal.infra.aenv import AenvConfig
from areal.workflow.aenv import AenvWorkflow

with PPOTrainer(config, ...) as trainer:
    trainer.train(
        workflow="areal.workflow.aenv.AenvWorkflow",
        workflow_kwargs={
            "aenv_config": AenvConfig(env_name="my-env@1.0.0"),
            ...
        },
    )

Tests

  • Unit tests: adapter, config, workflow, schema conversion
  • Integration tests: real aenv server (requires RUN_AENV_INTEGRATION=1)

Compatibility

  • aenv SDK is optional dependency (lazy import)
  • Python 3.11+
  • No impact on existing AReaL workflows

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not
    work as expected)
  • Documentation update
  • Code refactoring (no functional changes)
  • Performance improvement
  • Test coverage improvement

Checklist

  • I have read the Contributing Guide
  • I have run formatting tools (pre-commit or manual)
  • I have run relevant unit tests and they pass
  • I have added tests for new functionality
  • I have updated documentation if needed
  • My branch is up to date with main
  • This PR introduces breaking changes (if yes, fill out details below)
  • If this PR changes documentation, I have built and previewed it locally with
    jb build docs
  • No critical issues raised by AI reviewers (/gemini review)

Breaking Change Details (if applicable):

Additional Context


Need help? Check the Contributing Guide or ask in
GitHub Discussions!

@guozhihao-224
Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant new capability by integrating AEnvironment into the AReaL framework. This integration allows for sophisticated multi-turn tool calling within RL training, empowering models to dynamically interact with external environments and tools. The changes provide a structured and configurable way to manage these interactions, ensuring robustness and flexibility for advanced RL applications.

Highlights

  • AEnvironment Integration: Integrated AEnvironment to support Multi-turn Tool Calling for Reinforcement Learning (RL) training, enabling models to interact with external tools.
  • New Modules and Components: Introduced new modules under areal/infra/aenv for configuration, SDK adaptation, and schema conversion, and areal/workflow/aenv for the multi-turn workflow.
  • Robust Tool Execution: Implemented AenvEnvironmentAdapter with caching, retry logic for transient failures, and resource management, along with AenvConfig for detailed environment setup.
  • OpenAI Tool Schema Conversion: Added normalize_openai_tools() to convert MCP tool schema into an OpenAI function-calling format, facilitating broader compatibility.
  • Comprehensive Testing: Included extensive unit tests for adapter, configuration, schema, and workflow components, as well as integration tests for a live AEnvironment service.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • areal/infra/aenv/init.py
    • Added initialization file for the aenv infrastructure module, exposing key classes and functions.
  • areal/infra/aenv/adapter.py
    • Added AenvEnvironmentAdapter class to bridge AReaL workflows with the aenvironment SDK, including initialization, release, tool listing, and tool calling with retry logic.
  • areal/infra/aenv/config.py
    • Added AenvConfig dataclass for managing AEnvironment integration settings, including URL, timeouts, retry policies, and error handling.
  • areal/infra/aenv/schema.py
    • Added normalize_openai_tools function to convert AEnvironment tool metadata to OpenAI function-tool schema and parse_tool_arguments for safe argument parsing.
  • areal/utils/logging.py
    • Updated LOG_COLORS dictionary to include color mappings for AenvWorkflow and AenvAdapter loggers.
  • areal/workflow/aenv/init.py
    • Added initialization file for the aenv workflow module, exposing AenvWorkflow.
  • areal/workflow/aenv/workflow.py
    • Added AenvWorkflow class, a RolloutWorkflow that executes model tool calls through AEnvironment, handling multi-turn interactions and reward computation.
  • examples/aenv/config.yaml
    • Added a new example configuration file for AEnvironment-integrated GSM8K training, defining aenv parameters and workflow settings.
  • examples/aenv/train_aenv.py
    • Added a new training entrypoint script for launching AEnvironment-integrated workflows with PPOTrainer.
  • tests/integration/test_aenv_integration.py
    • Added integration tests for the AenvEnvironmentAdapter, including server lifecycle management and real tool calls.
  • tests/test_aenv_adapter.py
    • Added unit tests for AenvEnvironmentAdapter, verifying initialization, tool caching, retry logic, and cleanup behavior.
  • tests/test_aenv_config.py
    • Added unit tests for AenvConfig, validating default values and error handling for invalid configurations.
  • tests/test_aenv_schema.py
    • Added unit tests for AEnvironment schema helpers, covering tool normalization and argument parsing.
  • tests/test_aenv_workflow.py
    • Added unit tests for AenvWorkflow, covering tool execution, error handling policies, reward computation, and OpenAI request parameters.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the AEnvironment integration for AReaL, enabling MCP-based multi-turn tool calling for RL training. It includes new modules for configuration, adapter, schema conversion, and workflow management, along with corresponding unit and integration tests. The changes also update the logging utility to include new components. The review focuses on potential runtime errors, error handling, and code clarity.

Comment thread areal/workflow/aenv/workflow.py
Comment on lines +159 to +161
raise RuntimeError(
f"Tool call failed after {max_attempts} attempts: {tool_name}"
) from last_error
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

high: The RuntimeError raised here doesn't provide specific context about the tool name that failed. Including the tool name in the exception message would significantly aid debugging.

Also, consider logging the last_error for more detailed diagnostics.

Suggested change
raise RuntimeError(
f"Tool call failed after {max_attempts} attempts: {tool_name}"
) from last_error
raise RuntimeError(
f"Tool call failed after {max_attempts} attempts: {tool_name}. Last error: {last_error}"
) from last_error

Comment on lines +161 to +163
if not tool_name:
if self.aenv_config.tool_error_policy == "raise":
raise RuntimeError("Tool call is missing function name")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

high: If tool_name is missing and tool_error_policy is set to raise, it's more appropriate to raise a ValueError since the tool name is a required field. This clearly indicates that the tool call is malformed.

                if self.aenv_config.tool_error_policy == "raise":
                    raise ValueError("Tool call is missing function name")

Comment on lines +95 to +96
if self._env is None:
raise RuntimeError("Environment not initialized. Call initialize() first.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

medium: It's crucial to ensure that self._env is properly initialized before proceeding. Raising a more specific exception type, such as AenvNotInitializedError, could provide better clarity for debugging and error handling.

Suggested change
if self._env is None:
raise RuntimeError("Environment not initialized. Call initialize() first.")
class AenvNotInitializedError(Exception):
"""Exception raised when AEnvironment is not initialized."""
pass
async def list_tools(self, use_cache: bool = True) -> list[dict[str, Any]]:
"""List tools exposed by the current environment."""
if self._env is None:
raise AenvNotInitializedError("Environment not initialized. Call initialize() first.")

Comment on lines +89 to +90
if not self.aenv_url:
raise ValueError("aenv_url must be non-empty")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

medium: It's beneficial to provide more context in the ValueError message, such as indicating which attribute is expected to be non-empty.

            raise ValueError("aenv_url must be non-empty string")

Comment thread areal/infra/aenv/config.py
Comment on lines +210 to +211
# Argument parsing failed - log detailed context before handling
logger.warning(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

medium: Consider adding a check to ensure raw_arguments is not None before attempting to represent it. This prevents a potential TypeError if raw_arguments is unexpectedly None.

                    "raw_arguments": repr(raw_arguments)[:500] if raw_arguments is not None else "None",  # Truncate for safety

Comment thread areal/utils/logging.py
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new AEnvironment integration, enabling AReaL workflows to execute model tool calls through an AenvEnvironmentAdapter. Key changes include defining configuration (AenvConfig), adapting aenvironment SDK interactions (initialization, tool listing, and execution with retry logic), and providing schema helpers for OpenAI tool compatibility. A new AenvWorkflow class orchestrates multi-turn interactions, handling tool calls and reward computation. The logging utility is updated to include new workflow and adapter types. Review comments highlight several areas for improvement, including a high-severity arbitrary code execution vulnerability due to dynamic reward_fn imports, a medium-severity LLM tool abuse vulnerability requiring tool name and argument validation, and suggestions for more robust error handling with specific exception types and improved logging context in the adapter and workflow components. Additionally, a minor comment suggests renaming a logging category for better description.

Comment thread areal/workflow/aenv/workflow.py
Comment on lines +161 to +170
if not tool_name:
if self.aenv_config.tool_error_policy == "raise":
raise RuntimeError("Tool call is missing function name")
messages.append(
{
"role": "tool",
"content": "Error: missing tool function name",
"tool_call_id": tool_call_id,
}
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

high: If tool_name is missing and tool_error_policy is set to raise, a RuntimeError is raised. However, if the policy is not raise, an error message is appended to the messages list. It would be better to append the error message regardless of the tool_error_policy to provide more comprehensive error reporting.

                if self.aenv_config.tool_error_policy == "raise":
                    raise RuntimeError("Tool call is missing function name")
                messages.append(
                    {
                        "role": "tool",
                        "content": "Error: missing tool function name",
                        "tool_call_id": tool_call_id,
                    }
                )
                continue

Comment thread areal/workflow/aenv/workflow.py
Comment on lines +83 to +84
except Exception as exc: # pragma: no cover - defensive cleanup path
logger.warning(f"Failed to release AEnvironment instance: {exc}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

medium: Consider adding a more specific exception type instead of catching all exceptions. This could prevent unexpected behavior if an unanticipated exception occurs.

        try:
            await self._env.release()
        except Exception as exc:  # pragma: no cover - defensive cleanup path
            logger.warning(f

Comment on lines +141 to +143
if attempt >= self.config.max_retries or not self._is_retriable_error(
exc
):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

medium: It might be beneficial to include the last_error in the log message for better debugging information.

                delay = self.config.retry_delay * (2**attempt)
                logger.warning(
                    "Retriable tool call failure",
                    extra={
                        "tool_name": tool_name,
                        "attempt": attempt + 1,
                        "max_attempts": max_attempts,
                        "delay": delay,
                        "error": str(exc),
                    },
                )
                await asyncio.sleep(delay)

Comment on lines +112 to +115
if tool_round_count >= self.max_turns:
logger.warning(
"Reached max_turns; stop before executing additional tool calls"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

medium: Consider adding more context to the log message, such as the current tool_round_count and max_turns values, to provide more insight into why the tool calls are being stopped.

Suggested change
if tool_round_count >= self.max_turns:
logger.warning(
"Reached max_turns; stop before executing additional tool calls"
)
if tool_round_count >= self.max_turns:
logger.warning(
"Reached max_turns; stop before executing additional tool calls"
)
break

Comment on lines +210 to +220
# Argument parsing failed - log detailed context before handling
logger.warning(
"Failed to parse tool arguments",
extra={
"tool_name": tool_name,
"tool_call_id": tool_call_id,
"raw_arguments": repr(raw_arguments)[
:500
], # Truncate for safety
"error": str(exc),
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

medium: Consider including the exception type in the log message for better debugging.

                # Argument parsing failed - log detailed context before handling
                logger.warning(
                    "Failed to parse tool arguments",
                    extra={
                        "tool_name": tool_name,
                        "tool_call_id": tool_call_id,
                        "raw_arguments": repr(raw_arguments)[
                            :500
                        ],  # Truncate for safety
                        "error": str(exc),
                    },
                )

Comment on lines +350 to +366
def _render_tool_content(content: Any) -> str:
if isinstance(content, str):
return content
if isinstance(content, list):
chunks: list[str] = []
for item in content:
if isinstance(item, dict):
if item.get("type") == "text" and "text" in item:
chunks.append(str(item["text"]))
else:
chunks.append(json.dumps(item, sort_keys=True))
else:
chunks.append(str(item))
return "\n".join(chunks)
if isinstance(content, dict):
return json.dumps(content, sort_keys=True)
return str(content)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

medium: The _render_tool_content method could benefit from more robust error handling. If json.dumps fails, it might be useful to log the error and return a default string to prevent the entire workflow from crashing.

        if isinstance(content, str):
            return content
        if isinstance(content, list):
            chunks: list[str] = []
            for item in content:
                if isinstance(item, dict):
                    if item.get("type") == "text" and "text" in item:
                        chunks.append(str(item["text"]))
                    else:
                        chunks.append(json.dumps(item, sort_keys=True))
                else:
                    chunks.append(str(item))
            return "\n".join(chunks)
        if isinstance(content, dict):
            return json.dumps(content, sort_keys=True)
        return str(content)

Comment thread areal/utils/logging.py
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces AEnvironment integration for AReaL workflows, including new modules for adapter, configuration, and schema handling. Review comments suggest raising an exception when a tool name is missing in AenvWorkflow if the error policy is set to "raise", sanitizing the aenv_url in the AenvEnvironmentAdapter to prevent logging sensitive information, validating the tool_name against the list of discovered tools in _execute_tool_calls to prevent prompt injection, catching specific exceptions in the release() call to avoid masking other issues, and including the exception and arguments in the RuntimeError raised when a tool call fails.

Comment on lines +161 to +163
if not tool_name:
if self.aenv_config.tool_error_policy == "raise":
raise RuntimeError("Tool call is missing function name")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If tool_name is indeed missing, it indicates a serious issue with the tool call structure. Instead of just logging an error, consider raising an exception, especially if self.aenv_config.tool_error_policy is set to 'raise'. This will halt the process and prevent potentially incorrect behavior.

                if self.aenv_config.tool_error_policy == "raise":
                    raise RuntimeError("Tool call is missing function name")
                else:
                    messages.append(
                        {
                            "role": "tool",
                            "content": "Error: missing tool function name",
                            "tool_call_id": tool_call_id,
                        }
                    )

Comment on lines +50 to +53
logger.info(
"Initializing AEnvironment instance",
extra={"env_name": self.config.env_name, "aenv_url": self.config.aenv_url},
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The initialize method logs the aenv_url configuration value. This URL may contain sensitive information such as credentials (e.g., basic authentication or API keys in the query string). Logging this information can lead to unauthorized access if logs are compromised. It is recommended to sanitize the URL before logging or avoid logging the full URL if it contains sensitive data.

Suggested change
logger.info(
"Initializing AEnvironment instance",
extra={"env_name": self.config.env_name, "aenv_url": self.config.aenv_url},
)
logger.info(
"Initializing AEnvironment instance",
extra={"env_name": self.config.env_name},
)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has fix

Comment thread areal/workflow/aenv/workflow.py
Comment on lines +83 to +84
except Exception as exc: # pragma: no cover - defensive cleanup path
logger.warning(f"Failed to release AEnvironment instance: {exc}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The except Exception as exc block is a very broad catch. It would be better to catch specific exceptions that you anticipate might occur during the release() call, such as network-related exceptions or exceptions raised by the aenv library itself. This will prevent unexpected exceptions from being silently caught and potentially masking other issues.

Suggested change
except Exception as exc: # pragma: no cover - defensive cleanup path
logger.warning(f"Failed to release AEnvironment instance: {exc}")
try:
await self._env.release()
except (OSError, asyncio.TimeoutError) as exc: # Example specific exceptions
logger.warning(f"Failed to release AEnvironment instance: {exc}")

Comment on lines +159 to +161
raise RuntimeError(
f"Tool call failed after {max_attempts} attempts: {tool_name}"
) from last_error
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The RuntimeError being raised here includes the tool name, which is good for debugging. However, it might be helpful to also include the specific exception that caused the tool call to fail, as well as the arguments passed to the tool. This would provide more context for diagnosing the issue.

Suggested change
raise RuntimeError(
f"Tool call failed after {max_attempts} attempts: {tool_name}"
) from last_error
raise RuntimeError(
f"Tool call failed after {max_attempts} attempts: {tool_name} with arguments {arguments}. Last error: {last_error}"
) from last_error

@guozhihao-224
Copy link
Copy Markdown
Contributor Author

cc @rchardx I'm not entirely sure if the current implementation path meets the requirements.

@guozhihao-224
Copy link
Copy Markdown
Contributor Author

After reading the documentation, I found that the current implementation using the inheritance of RolloutWorkflow seems incorrect and should be changed to a proxy approach, using something like AenvAgent ? @rchardx @garrett4wade

@rchardx
Copy link
Copy Markdown
Collaborator

rchardx commented Mar 25, 2026

After reading the documentation, I found that the current implementation using the inheritance of RolloutWorkflow seems incorrect and should be changed to a proxy approach, using something like AenvAgent ?

Your instinct is correct. After reviewing the code and documentation, I believe the current AenvWorkflow should be refactored from the Direct (Legacy) approach to the Proxy (Agent) approach. Here's the full rationale.

The Two Integration Paradigms

AReaL documents two distinct paradigms for agent/workflow integration (see docs/en/reference/agent_workflow.md and docs/en/customization/agent.md):

Aspect Proxy Approach (Recommended) Direct Approach (Legacy)
Pattern Duck-typed class with async def run(data, **extra_kwargs) Inherit RolloutWorkflow, implement arun_episode(engine, data)
LLM calls Standard AsyncOpenAI(base_url=...) ArealOpenAI(engine=...)
Token tracking Automatic (proxy intercepts all LLM calls) Manual (client.export_interactions())
Reward handling Just return float or return dict[str, float] Manual client.set_last_reward() + apply_reward_discount()
Scheduler support local / slurm only (no Ray) All schedulers

The documentation explicitly states:

"Legacy Pattern: The direct approach using ArealOpenAI with RolloutWorkflow is considered legacy and should not be used for new projects. Prefer the proxy approach above, which keeps agent code independent from AReaL internals."

docs/en/reference/agent_workflow.md L110-112

The current AenvWorkflow inherits RolloutWorkflow, uses ArealOpenAI directly, and manually manages the export lifecycle — this is the Legacy/Direct approach.

Precedent: Existing Agent Workflows

All recently added agent integrations in AReaL follow the Proxy approach:

None of them inherit RolloutWorkflow. AReaL's _resolve_workflow() (areal/infra/remote_inf_engine.py L528) detects the run() method and auto-wraps them in OpenAIProxyWorkflow.

Precedent: AEnvironment's Own tau2_rl Example

The AEnvironment SDK's official RL training example (aenv/examples/tau2_rl/agent.py) also uses the agent pattern:

async def run_agent_return_reward(data: Dict[str, Any]) -> float:
    env = Environment(env_name="tau2-env@1.0.0", ...)
    await env.initialize()

    openai_tools = await env.list_openai_tools()
    agent = OpenAIAgent(tools=openai_tools, instructions=...)

    # ... agent loop ...

    reward = await env.call_reward({})
    return float(reward.get("total_reward", 0.0))

No RolloutWorkflow inheritance, no ArealOpenAI — just a plain function that returns a reward float.

Suggested Refactoring

Rename AenvWorkflowAenvAgent and switch to the proxy pattern:

class AenvAgent:
    """Agent that uses AEnvironment for MCP-based tool execution."""

    def __init__(
        self,
        aenv_config: AenvConfig | None = None,
        reward_fn: Callable | str | None = None,
        max_turns: int = 8,
        system_prompt: str | None = None,
    ):
        self.aenv_config = aenv_config or AenvConfig()
        self.reward_fn = import_from_string(reward_fn) if isinstance(reward_fn, str) else reward_fn
        self.max_turns = max_turns
        self.system_prompt = system_prompt

    async def run(self, data: dict, **extra_kwargs) -> float:
        # Standard OpenAI SDK — AReaL proxy handles token tracking automatically
        client = AsyncOpenAI(
            base_url=extra_kwargs.get("base_url") or os.getenv("OPENAI_BASE_URL"),
            api_key=extra_kwargs.get("api_key") or os.getenv("OPENAI_API_KEY"),
            http_client=extra_kwargs.get("http_client"),
            max_retries=0,
        )

        async with AenvEnvironmentAdapter(self.aenv_config) as env:
            tools = normalize_openai_tools(await env.list_tools())
            messages = self._build_messages(data)

            for _ in range(self.max_turns):
                response = await client.chat.completions.create(
                    model="default",
                    messages=messages,
                    tools=tools or NOT_GIVEN,
                    tool_choice="auto" if tools else NOT_GIVEN,
                )
                msg = response.choices[0].message
                messages.append(msg.model_dump(exclude_none=True))

                if not msg.tool_calls:
                    break

                for tc in msg.tool_calls:
                    args = parse_tool_arguments(tc.function.arguments)
                    result = await env.call_tool(tc.function.name, args)
                    messages.append({
                        "role": "tool",
                        "content": self._render_tool_content(result.content),
                        "tool_call_id": tc.id,
                    })

        # Just return reward — AReaL handles discount, export, token tracking
        if self.reward_fn:
            return float(await self.reward_fn(messages=messages, answer=data.get("answer")))
        return 0.0

Usage stays almost the same:

trainer.train(
    workflow="areal.workflow.aenv.AenvAgent",
    workflow_kwargs={
        "aenv_config": AenvConfig(env_name="my-env@1.0.0"),
        "reward_fn": "areal.reward.gsm8k.gsm8k_reward_fn",
        "max_turns": 8,
    },
)

What This Eliminates

By switching to the proxy pattern, you can remove ~200 lines of manual lifecycle management that AReaL handles automatically:

  • ArealOpenAI(engine=engine, tokenizer=...) → ✅ AsyncOpenAI(base_url=...)
  • client.set_last_reward(reward) → ✅ return reward
  • client.apply_reward_discount(turn_discount=...) → ✅ Configured via rollout.openai.turn_discount in YAML
  • client.export_interactions(style=...) → ✅ Configured via rollout.openai.export_style in YAML
  • gconfig.to_openai_args_dict(exclude_args=["n_samples"]) → ✅ AReaL injects generation params automatically

What to Keep

The infrastructure modules under areal/infra/aenv/ are well-structured and should remain:

  • AenvConfig — clean dataclass with validation
  • AenvEnvironmentAdapter — retry logic, caching, lifecycle management
  • normalize_openai_tools() — still needed for converting inputSchema to OpenAI dict format (note: AEnv SDK's list_openai_tools() returns FunctionTool objects for the Agents SDK, not the dict format needed by chat.completions.create(tools=...))
  • parse_tool_arguments() — safe JSON parsing

Caveats

  1. Scheduler limitation: The proxy approach does not support the ray scheduler — only local and slurm. If Ray support is required, the direct RolloutWorkflow approach could be kept as a fallback, but this should be a deliberate choice, not the default.

  2. tool_call_parser configuration: In the proxy approach, tool call parsing is configured on the inference engine side (SGLang/vLLM), not in agent code. The YAML config should set this appropriately (e.g., via SGLang's --tool-call-parser flag or equivalent).

  3. rollout.openai config section: The proxy approach uses rollout.openai.turn_discount, rollout.openai.export_style, and rollout.openai.mode from the YAML config instead of hardcoding them in the workflow class. The example config (examples/aenv/config.yaml) should be updated to include this section.

  4. Alternative — OpenAI Agents SDK integration: Since AEnvironment provides list_openai_tools() which returns FunctionTool objects compatible with the OpenAI Agents SDK, you could also use agents.Runner.run() to handle the tool loop automatically (like the tau2_rl example does), instead of implementing the tool loop manually. This would simplify the agent even further.


Hope this helps clarify the architectural direction. The bottom line: the areal/infra/aenv/ modules are solid; the workflow layer just needs to be reshaped from RolloutWorkflow subclass into a proxy-compatible agent class.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

This pull request has been automatically marked as stale because it has not had recent activity within the last 14 days.

Please add a comment or push new commits to keep it active.

Thank you for your contribution!

@github-actions github-actions bot added the stale label Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants