feat(aenv): Integrating aenvironment for environment handling by guozhihao-224 · Pull Request #1018 · inclusionAI/AReaL

guozhihao-224 · 2026-03-11T12:25:54Z

Description

Related Issue

This feature is part of the 2026 Q1 Roadmap (see #907 - "Integrating aenvironment for environment handling").

Summary

Integrate AEnvironment to support MCP-based multi-turn tool calling for RL training.

New Modules

areal/infra/aenv/
├── config.py      # Configuration: aenv URL, timeout, retry, error policy
├── adapter.py     # Wrapper for aenv SDK (initialize, list_tools, call_tool, release)
├── schema.py      # MCP → OpenAI tool format conversion
└── __init__.py

areal/workflow/aenv/
├── workflow.py    # Multi-turn workflow with tool execution
└── __init__.py

examples/aenv/
└── train_aenv.py  # Training entrypoint

tests/
├── test_aenv_adapter.py
├── test_aenv_config.py
├── test_aenv_workflow.py
├── test_aenv_schema.py
└── integration/test_aenv_integration.py

Key Components

Component	Description
`AenvConfig`	Configuration with validation: URL, timeout, retry policy, tool error handling
`AenvEnvironmentAdapter`	Wraps aenv SDK with caching, retry logic, resource management
`AenvWorkflow`	Multi-turn RolloutWorkflow that executes model-generated tool calls
`normalize_openai_tools()`	Converts MCP tool schema to OpenAI function-calling format
`parse_tool_arguments()`	Safe JSON argument parsing

Usage

from areal import PPOTrainer
from areal.infra.aenv import AenvConfig
from areal.workflow.aenv import AenvWorkflow

with PPOTrainer(config, ...) as trainer:
    trainer.train(
        workflow="areal.workflow.aenv.AenvWorkflow",
        workflow_kwargs={
            "aenv_config": AenvConfig(env_name="my-env@1.0.0"),
            ...
        },
    )

Tests

Unit tests: adapter, config, workflow, schema conversion
Integration tests: real aenv server (requires RUN_AENV_INTEGRATION=1)

Compatibility

aenv SDK is optional dependency (lazy import)
Python 3.11+
No impact on existing AReaL workflows

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not
work as expected)
Documentation update
Code refactoring (no functional changes)
Performance improvement
Test coverage improvement

Checklist

I have read the Contributing Guide
I have run formatting tools (pre-commit or manual)
I have run relevant unit tests and they pass
I have added tests for new functionality
I have updated documentation if needed
My branch is up to date with main
This PR introduces breaking changes (if yes, fill out details below)
If this PR changes documentation, I have built and previewed it locally with
jb build docs
No critical issues raised by AI reviewers (/gemini review)

Breaking Change Details (if applicable):

Additional Context

Need help? Check the Contributing Guide or ask in
GitHub Discussions!

…low support

guozhihao-224 · 2026-03-11T12:27:39Z

/gemini review

gemini-code-assist · 2026-03-11T13:02:05Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant new capability by integrating AEnvironment into the AReaL framework. This integration allows for sophisticated multi-turn tool calling within RL training, empowering models to dynamically interact with external environments and tools. The changes provide a structured and configurable way to manage these interactions, ensuring robustness and flexibility for advanced RL applications.

Highlights

AEnvironment Integration: Integrated AEnvironment to support Multi-turn Tool Calling for Reinforcement Learning (RL) training, enabling models to interact with external tools.
New Modules and Components: Introduced new modules under areal/infra/aenv for configuration, SDK adaptation, and schema conversion, and areal/workflow/aenv for the multi-turn workflow.
Robust Tool Execution: Implemented AenvEnvironmentAdapter with caching, retry logic for transient failures, and resource management, along with AenvConfig for detailed environment setup.
OpenAI Tool Schema Conversion: Added normalize_openai_tools() to convert MCP tool schema into an OpenAI function-calling format, facilitating broader compatibility.
Comprehensive Testing: Included extensive unit tests for adapter, configuration, schema, and workflow components, as well as integration tests for a live AEnvironment service.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

areal/infra/aenv/init.py
- Added initialization file for the aenv infrastructure module, exposing key classes and functions.
areal/infra/aenv/adapter.py
- Added AenvEnvironmentAdapter class to bridge AReaL workflows with the aenvironment SDK, including initialization, release, tool listing, and tool calling with retry logic.
areal/infra/aenv/config.py
- Added AenvConfig dataclass for managing AEnvironment integration settings, including URL, timeouts, retry policies, and error handling.
areal/infra/aenv/schema.py
- Added normalize_openai_tools function to convert AEnvironment tool metadata to OpenAI function-tool schema and parse_tool_arguments for safe argument parsing.
areal/utils/logging.py
- Updated LOG_COLORS dictionary to include color mappings for AenvWorkflow and AenvAdapter loggers.
areal/workflow/aenv/init.py
- Added initialization file for the aenv workflow module, exposing AenvWorkflow.
areal/workflow/aenv/workflow.py
- Added AenvWorkflow class, a RolloutWorkflow that executes model tool calls through AEnvironment, handling multi-turn interactions and reward computation.
examples/aenv/config.yaml
- Added a new example configuration file for AEnvironment-integrated GSM8K training, defining aenv parameters and workflow settings.
examples/aenv/train_aenv.py
- Added a new training entrypoint script for launching AEnvironment-integrated workflows with PPOTrainer.
tests/integration/test_aenv_integration.py
- Added integration tests for the AenvEnvironmentAdapter, including server lifecycle management and real tool calls.
tests/test_aenv_adapter.py
- Added unit tests for AenvEnvironmentAdapter, verifying initialization, tool caching, retry logic, and cleanup behavior.
tests/test_aenv_config.py
- Added unit tests for AenvConfig, validating default values and error handling for invalid configurations.
tests/test_aenv_schema.py
- Added unit tests for AEnvironment schema helpers, covering tool normalization and argument parsing.
tests/test_aenv_workflow.py
- Added unit tests for AenvWorkflow, covering tool execution, error handling policies, reward computation, and OpenAI request parameters.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces the AEnvironment integration for AReaL, enabling MCP-based multi-turn tool calling for RL training. It includes new modules for configuration, adapter, schema conversion, and workflow management, along with corresponding unit and integration tests. The changes also update the logging utility to include new components. The review focuses on potential runtime errors, error handling, and code clarity.

gemini-code-assist · 2026-03-11T13:07:45Z

+        raise RuntimeError(
+            f"Tool call failed after {max_attempts} attempts: {tool_name}"
+        ) from last_error


high: The RuntimeError raised here doesn't provide specific context about the tool name that failed. Including the tool name in the exception message would significantly aid debugging.

Also, consider logging the last_error for more detailed diagnostics.

Suggested change

raise RuntimeError(

f"Tool call failed after {max_attempts} attempts: {tool_name}"

) from last_error

raise RuntimeError(

f"Tool call failed after {max_attempts} attempts: {tool_name}. Last error: {last_error}"

) from last_error

gemini-code-assist · 2026-03-11T13:07:45Z

+            if not tool_name:
+                if self.aenv_config.tool_error_policy == "raise":
+                    raise RuntimeError("Tool call is missing function name")


high: If tool_name is missing and tool_error_policy is set to raise, it's more appropriate to raise a ValueError since the tool name is a required field. This clearly indicates that the tool call is malformed.

if self.aenv_config.tool_error_policy == "raise": raise ValueError("Tool call is missing function name")

gemini-code-assist · 2026-03-11T13:07:46Z

+        if self._env is None:
+            raise RuntimeError("Environment not initialized. Call initialize() first.")


medium: It's crucial to ensure that self._env is properly initialized before proceeding. Raising a more specific exception type, such as AenvNotInitializedError, could provide better clarity for debugging and error handling.

Suggested change

if self._env is None:

raise RuntimeError("Environment not initialized. Call initialize() first.")

class AenvNotInitializedError(Exception):

"""Exception raised when AEnvironment is not initialized."""

pass

async def list_tools(self, use_cache: bool = True) -> list[dict[str, Any]]:

"""List tools exposed by the current environment."""

if self._env is None:

raise AenvNotInitializedError("Environment not initialized. Call initialize() first.")

gemini-code-assist · 2026-03-11T13:07:46Z

+        if not self.aenv_url:
+            raise ValueError("aenv_url must be non-empty")


medium: It's beneficial to provide more context in the ValueError message, such as indicating which attribute is expected to be non-empty.

raise ValueError("aenv_url must be non-empty string")

gemini-code-assist · 2026-03-11T13:07:46Z

+                # Argument parsing failed - log detailed context before handling
+                logger.warning(


medium: Consider adding a check to ensure raw_arguments is not None before attempting to represent it. This prevents a potential TypeError if raw_arguments is unexpectedly None.

"raw_arguments": repr(raw_arguments)[:500] if raw_arguments is not None else "None", # Truncate for safety

gemini-code-assist

Code Review

This pull request introduces a new AEnvironment integration, enabling AReaL workflows to execute model tool calls through an AenvEnvironmentAdapter. Key changes include defining configuration (AenvConfig), adapting aenvironment SDK interactions (initialization, tool listing, and execution with retry logic), and providing schema helpers for OpenAI tool compatibility. A new AenvWorkflow class orchestrates multi-turn interactions, handling tool calls and reward computation. The logging utility is updated to include new workflow and adapter types. Review comments highlight several areas for improvement, including a high-severity arbitrary code execution vulnerability due to dynamic reward_fn imports, a medium-severity LLM tool abuse vulnerability requiring tool name and argument validation, and suggestions for more robust error handling with specific exception types and improved logging context in the adapter and workflow components. Additionally, a minor comment suggests renaming a logging category for better description.

gemini-code-assist · 2026-03-11T13:12:20Z

+            if not tool_name:
+                if self.aenv_config.tool_error_policy == "raise":
+                    raise RuntimeError("Tool call is missing function name")
+                messages.append(
+                    {
+                        "role": "tool",
+                        "content": "Error: missing tool function name",
+                        "tool_call_id": tool_call_id,
+                    }
+                )


high: If tool_name is missing and tool_error_policy is set to raise, a RuntimeError is raised. However, if the policy is not raise, an error message is appended to the messages list. It would be better to append the error message regardless of the tool_error_policy to provide more comprehensive error reporting.

if self.aenv_config.tool_error_policy == "raise": raise RuntimeError("Tool call is missing function name") messages.append( { "role": "tool", "content": "Error: missing tool function name", "tool_call_id": tool_call_id, } ) continue

gemini-code-assist · 2026-03-11T13:12:20Z

+        except Exception as exc:  # pragma: no cover - defensive cleanup path
+            logger.warning(f"Failed to release AEnvironment instance: {exc}")


medium: Consider adding a more specific exception type instead of catching all exceptions. This could prevent unexpected behavior if an unanticipated exception occurs.

try: await self._env.release() except Exception as exc: # pragma: no cover - defensive cleanup path logger.warning(f

gemini-code-assist · 2026-03-11T13:12:20Z

+                if attempt >= self.config.max_retries or not self._is_retriable_error(
+                    exc
+                ):


medium: It might be beneficial to include the last_error in the log message for better debugging information.

delay = self.config.retry_delay * (2**attempt) logger.warning( "Retriable tool call failure", extra={ "tool_name": tool_name, "attempt": attempt + 1, "max_attempts": max_attempts, "delay": delay, "error": str(exc), }, ) await asyncio.sleep(delay)

gemini-code-assist · 2026-03-11T13:12:20Z

+                if tool_round_count >= self.max_turns:
+                    logger.warning(
+                        "Reached max_turns; stop before executing additional tool calls"
+                    )


medium: Consider adding more context to the log message, such as the current tool_round_count and max_turns values, to provide more insight into why the tool calls are being stopped.

Suggested change

if tool_round_count >= self.max_turns:

logger.warning(

"Reached max_turns; stop before executing additional tool calls"

)

if tool_round_count >= self.max_turns:

logger.warning(

"Reached max_turns; stop before executing additional tool calls"

)

break

gemini-code-assist · 2026-03-11T13:12:20Z

+                # Argument parsing failed - log detailed context before handling
+                logger.warning(
+                    "Failed to parse tool arguments",
+                    extra={
+                        "tool_name": tool_name,
+                        "tool_call_id": tool_call_id,
+                        "raw_arguments": repr(raw_arguments)[
+                            :500
+                        ],  # Truncate for safety
+                        "error": str(exc),
+                    },


medium: Consider including the exception type in the log message for better debugging.

# Argument parsing failed - log detailed context before handling logger.warning( "Failed to parse tool arguments", extra={ "tool_name": tool_name, "tool_call_id": tool_call_id, "raw_arguments": repr(raw_arguments)[ :500 ], # Truncate for safety "error": str(exc), }, )

gemini-code-assist · 2026-03-11T13:12:20Z

+    def _render_tool_content(content: Any) -> str:
+        if isinstance(content, str):
+            return content
+        if isinstance(content, list):
+            chunks: list[str] = []
+            for item in content:
+                if isinstance(item, dict):
+                    if item.get("type") == "text" and "text" in item:
+                        chunks.append(str(item["text"]))
+                    else:
+                        chunks.append(json.dumps(item, sort_keys=True))
+                else:
+                    chunks.append(str(item))
+            return "\n".join(chunks)
+        if isinstance(content, dict):
+            return json.dumps(content, sort_keys=True)
+        return str(content)


medium: The _render_tool_content method could benefit from more robust error handling. If json.dumps fails, it might be useful to log the error and return a default string to prevent the entire workflow from crashing.

if isinstance(content, str): return content if isinstance(content, list): chunks: list[str] = [] for item in content: if isinstance(item, dict): if item.get("type") == "text" and "text" in item: chunks.append(str(item["text"])) else: chunks.append(json.dumps(item, sort_keys=True)) else: chunks.append(str(item)) return "\n".join(chunks) if isinstance(content, dict): return json.dumps(content, sort_keys=True) return str(content)

gemini-code-assist

Code Review

The pull request introduces AEnvironment integration for AReaL workflows, including new modules for adapter, configuration, and schema handling. Review comments suggest raising an exception when a tool name is missing in AenvWorkflow if the error policy is set to "raise", sanitizing the aenv_url in the AenvEnvironmentAdapter to prevent logging sensitive information, validating the tool_name against the list of discovered tools in _execute_tool_calls to prevent prompt injection, catching specific exceptions in the release() call to avoid masking other issues, and including the exception and arguments in the RuntimeError raised when a tool call fails.

gemini-code-assist · 2026-03-11T13:31:49Z

+            if not tool_name:
+                if self.aenv_config.tool_error_policy == "raise":
+                    raise RuntimeError("Tool call is missing function name")


If tool_name is indeed missing, it indicates a serious issue with the tool call structure. Instead of just logging an error, consider raising an exception, especially if self.aenv_config.tool_error_policy is set to 'raise'. This will halt the process and prevent potentially incorrect behavior.

if self.aenv_config.tool_error_policy == "raise": raise RuntimeError("Tool call is missing function name") else: messages.append( { "role": "tool", "content": "Error: missing tool function name", "tool_call_id": tool_call_id, } )

gemini-code-assist · 2026-03-11T13:31:49Z

+        logger.info(
+            "Initializing AEnvironment instance",
+            extra={"env_name": self.config.env_name, "aenv_url": self.config.aenv_url},
+        )


The initialize method logs the aenv_url configuration value. This URL may contain sensitive information such as credentials (e.g., basic authentication or API keys in the query string). Logging this information can lead to unauthorized access if logs are compromised. It is recommended to sanitize the URL before logging or avoid logging the full URL if it contains sensitive data.

Suggested change

logger.info(

"Initializing AEnvironment instance",

extra={"env_name": self.config.env_name, "aenv_url": self.config.aenv_url},

)

logger.info(

"Initializing AEnvironment instance",

extra={"env_name": self.config.env_name},

)

gemini-code-assist · 2026-03-11T13:31:49Z

+        except Exception as exc:  # pragma: no cover - defensive cleanup path
+            logger.warning(f"Failed to release AEnvironment instance: {exc}")


The except Exception as exc block is a very broad catch. It would be better to catch specific exceptions that you anticipate might occur during the release() call, such as network-related exceptions or exceptions raised by the aenv library itself. This will prevent unexpected exceptions from being silently caught and potentially masking other issues.

Suggested change

except Exception as exc: # pragma: no cover - defensive cleanup path

logger.warning(f"Failed to release AEnvironment instance: {exc}")

try:

await self._env.release()

except (OSError, asyncio.TimeoutError) as exc: # Example specific exceptions

logger.warning(f"Failed to release AEnvironment instance: {exc}")

gemini-code-assist · 2026-03-11T13:31:49Z

+        raise RuntimeError(
+            f"Tool call failed after {max_attempts} attempts: {tool_name}"
+        ) from last_error


The RuntimeError being raised here includes the tool name, which is good for debugging. However, it might be helpful to also include the specific exception that caused the tool call to fail, as well as the arguments passed to the tool. This would provide more context for diagnosing the issue.

Suggested change

raise RuntimeError(

f"Tool call failed after {max_attempts} attempts: {tool_name}"

) from last_error

raise RuntimeError(

f"Tool call failed after {max_attempts} attempts: {tool_name} with arguments {arguments}. Last error: {last_error}"

) from last_error

guozhihao-224 · 2026-03-12T03:08:45Z

cc @rchardx I'm not entirely sure if the current implementation path meets the requirements.

guozhihao-224 · 2026-03-12T11:00:15Z

After reading the documentation, I found that the current implementation using the inheritance of RolloutWorkflow seems incorrect and should be changed to a proxy approach, using something like AenvAgent ？ @rchardx @garrett4wade

rchardx · 2026-03-25T03:29:30Z

After reading the documentation, I found that the current implementation using the inheritance of RolloutWorkflow seems incorrect and should be changed to a proxy approach, using something like AenvAgent ？

Your instinct is correct. After reviewing the code and documentation, I believe the current AenvWorkflow should be refactored from the Direct (Legacy) approach to the Proxy (Agent) approach. Here's the full rationale.

The Two Integration Paradigms

AReaL documents two distinct paradigms for agent/workflow integration (see docs/en/reference/agent_workflow.md and docs/en/customization/agent.md):

Aspect	Proxy Approach (Recommended)	Direct Approach (Legacy)
Pattern	Duck-typed class with `async def run(data, **extra_kwargs)`	Inherit `RolloutWorkflow`, implement `arun_episode(engine, data)`
LLM calls	Standard `AsyncOpenAI(base_url=...)`	`ArealOpenAI(engine=...)`
Token tracking	Automatic (proxy intercepts all LLM calls)	Manual (`client.export_interactions()`)
Reward handling	Just `return float` or `return dict[str, float]`	Manual `client.set_last_reward()` + `apply_reward_discount()`
Scheduler support	`local` / `slurm` only (no Ray)	All schedulers

The documentation explicitly states:

"Legacy Pattern: The direct approach using ArealOpenAI with RolloutWorkflow is considered legacy and should not be used for new projects. Prefer the proxy approach above, which keeps agent code independent from AReaL internals."

— docs/en/reference/agent_workflow.md L110-112

The current AenvWorkflow inherits RolloutWorkflow, uses ArealOpenAI directly, and manually manages the export lifecycle — this is the Legacy/Direct approach.

Precedent: Existing Agent Workflows

All recently added agent integrations in AReaL follow the Proxy approach:

MathToolAgent (areal/workflow/openai/math_agent.py) — Multi-turn tool calling with OpenAI Agents SDK, returns reward directly
MultiTurnMathAgent (same file) — Multi-turn with per-completion rewards via dict[str, float]
Claude Agent (areal/workflow/anthropic/claude_math_agent.py)
LangChain Agent (areal/workflow/langchain/math_agent.py)

None of them inherit RolloutWorkflow. AReaL's _resolve_workflow() (areal/infra/remote_inf_engine.py L528) detects the run() method and auto-wraps them in OpenAIProxyWorkflow.

Precedent: AEnvironment's Own tau2_rl Example

The AEnvironment SDK's official RL training example (aenv/examples/tau2_rl/agent.py) also uses the agent pattern:

async def run_agent_return_reward(data: Dict[str, Any]) -> float:
    env = Environment(env_name="tau2-env@1.0.0", ...)
    await env.initialize()

    openai_tools = await env.list_openai_tools()
    agent = OpenAIAgent(tools=openai_tools, instructions=...)

    # ... agent loop ...

    reward = await env.call_reward({})
    return float(reward.get("total_reward", 0.0))

No RolloutWorkflow inheritance, no ArealOpenAI — just a plain function that returns a reward float.

Suggested Refactoring

Rename AenvWorkflow → AenvAgent and switch to the proxy pattern:

class AenvAgent:
    """Agent that uses AEnvironment for MCP-based tool execution."""

    def __init__(
        self,
        aenv_config: AenvConfig | None = None,
        reward_fn: Callable | str | None = None,
        max_turns: int = 8,
        system_prompt: str | None = None,
    ):
        self.aenv_config = aenv_config or AenvConfig()
        self.reward_fn = import_from_string(reward_fn) if isinstance(reward_fn, str) else reward_fn
        self.max_turns = max_turns
        self.system_prompt = system_prompt

    async def run(self, data: dict, **extra_kwargs) -> float:
        # Standard OpenAI SDK — AReaL proxy handles token tracking automatically
        client = AsyncOpenAI(
            base_url=extra_kwargs.get("base_url") or os.getenv("OPENAI_BASE_URL"),
            api_key=extra_kwargs.get("api_key") or os.getenv("OPENAI_API_KEY"),
            http_client=extra_kwargs.get("http_client"),
            max_retries=0,
        )

        async with AenvEnvironmentAdapter(self.aenv_config) as env:
            tools = normalize_openai_tools(await env.list_tools())
            messages = self._build_messages(data)

            for _ in range(self.max_turns):
                response = await client.chat.completions.create(
                    model="default",
                    messages=messages,
                    tools=tools or NOT_GIVEN,
                    tool_choice="auto" if tools else NOT_GIVEN,
                )
                msg = response.choices[0].message
                messages.append(msg.model_dump(exclude_none=True))

                if not msg.tool_calls:
                    break

                for tc in msg.tool_calls:
                    args = parse_tool_arguments(tc.function.arguments)
                    result = await env.call_tool(tc.function.name, args)
                    messages.append({
                        "role": "tool",
                        "content": self._render_tool_content(result.content),
                        "tool_call_id": tc.id,
                    })

        # Just return reward — AReaL handles discount, export, token tracking
        if self.reward_fn:
            return float(await self.reward_fn(messages=messages, answer=data.get("answer")))
        return 0.0

Usage stays almost the same:

trainer.train(
    workflow="areal.workflow.aenv.AenvAgent",
    workflow_kwargs={
        "aenv_config": AenvConfig(env_name="my-env@1.0.0"),
        "reward_fn": "areal.reward.gsm8k.gsm8k_reward_fn",
        "max_turns": 8,
    },
)

What This Eliminates

By switching to the proxy pattern, you can remove ~200 lines of manual lifecycle management that AReaL handles automatically:

❌ ArealOpenAI(engine=engine, tokenizer=...) → ✅ AsyncOpenAI(base_url=...)
❌ client.set_last_reward(reward) → ✅ return reward
❌ client.apply_reward_discount(turn_discount=...) → ✅ Configured via rollout.openai.turn_discount in YAML
❌ client.export_interactions(style=...) → ✅ Configured via rollout.openai.export_style in YAML
❌ gconfig.to_openai_args_dict(exclude_args=["n_samples"]) → ✅ AReaL injects generation params automatically

What to Keep

The infrastructure modules under areal/infra/aenv/ are well-structured and should remain:

✅ AenvConfig — clean dataclass with validation
✅ AenvEnvironmentAdapter — retry logic, caching, lifecycle management
✅ normalize_openai_tools() — still needed for converting inputSchema to OpenAI dict format (note: AEnv SDK's list_openai_tools() returns FunctionTool objects for the Agents SDK, not the dict format needed by chat.completions.create(tools=...))
✅ parse_tool_arguments() — safe JSON parsing

Caveats

Scheduler limitation: The proxy approach does not support the ray scheduler — only local and slurm. If Ray support is required, the direct RolloutWorkflow approach could be kept as a fallback, but this should be a deliberate choice, not the default.
tool_call_parser configuration: In the proxy approach, tool call parsing is configured on the inference engine side (SGLang/vLLM), not in agent code. The YAML config should set this appropriately (e.g., via SGLang's --tool-call-parser flag or equivalent).
rollout.openai config section: The proxy approach uses rollout.openai.turn_discount, rollout.openai.export_style, and rollout.openai.mode from the YAML config instead of hardcoding them in the workflow class. The example config (examples/aenv/config.yaml) should be updated to include this section.
Alternative — OpenAI Agents SDK integration: Since AEnvironment provides list_openai_tools() which returns FunctionTool objects compatible with the OpenAI Agents SDK, you could also use agents.Runner.run() to handle the tool loop automatically (like the tau2_rl example does), instead of implementing the tool loop manually. This would simplify the agent even further.

Hope this helps clarify the architectural direction. The bottom line: the areal/infra/aenv/ modules are solid; the workflow layer just needs to be reshaped from RolloutWorkflow subclass into a proxy-compatible agent class.

github-actions · 2026-04-09T01:58:51Z

This pull request has been automatically marked as stale because it has not had recent activity within the last 14 days.

Please add a comment or push new commits to keep it active.

Thank you for your contribution!

feat(aenv): introduce AEnvironment integration with adapter and workf…

dde314a

…low support

gemini-code-assist bot reviewed Mar 11, 2026

View reviewed changes

Merge branch 'main' into feat/aenv

97cb774

github-actions bot added the stale label Apr 9, 2026

		if self._env is None:
		raise RuntimeError("Environment not initialized. Call initialize() first.")

-        if self._env is None:
-            raise RuntimeError("Environment not initialized. Call initialize() first.")
+class AenvNotInitializedError(Exception):
+    """Exception raised when AEnvironment is not initialized."""
+    pass
+    async def list_tools(self, use_cache: bool = True) -> list[dict[str, Any]]:
+        """List tools exposed by the current environment."""
+        if self._env is None:
+            raise AenvNotInitializedError("Environment not initialized. Call initialize() first.")

		if not self.aenv_url:
		raise ValueError("aenv_url must be non-empty")

		# Argument parsing failed - log detailed context before handling
		logger.warning(

		except Exception as exc: # pragma: no cover - defensive cleanup path
		logger.warning(f"Failed to release AEnvironment instance: {exc}")

Conversation

guozhihao-224 commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Summary

New Modules

Key Components

Usage

Tests

Compatibility

Type of Change

Checklist

Additional Context

Uh oh!

guozhihao-224 commented Mar 11, 2026

Uh oh!

gemini-code-assist bot commented Mar 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

guozhihao-224 Mar 12, 2026

guozhihao-224 commented Mar 11, 2026 •

edited

Loading