Skip to content

PCDorg/RewGen4Robotics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reinforcement Learning with LLM-Generated Reward Functions

This project integrates Reinforcement Learning (RL), specifically Soft Actor-Critic (SAC), with Large Language Models (LLMs) like OpenAI's GPT series. The core idea is to leverage LLMs to automatically generate and iteratively refine reward functions for robotic tasks, aiming to improve learning efficiency and final policy performance in simulated environments like FetchReach.

Features

  • RL Algorithm: Utilizes Stable Baselines3.
  • LLM Integration: Connects with OpenAI's GPT and Groq API models to generate and refine reward functions based on task descriptions and potentially environment feedback.
  • Environment Support: Currently configured for the FetchReach-v3 environment from gymnasium-robotics.
  • Configuration: Uses Hydra for flexible and structured configuration management.
  • Logging: Includes TensorBoard logging for visualizing training progress (e.g., rewards, losses).
  • Iterative Refinement: Supports multiple iterations where the reward function is updated by the LLM based on previous results.

Installation

  1. Clone the Repository:

    git clone https://github.com/PCDorg/RewGen4Robotics.git
    cd RewGen4Robotics
  2. Create and Activate Virtual Environment (Recommended):

    python -m venv venv
    source venv/bin/activate  # Linux/macOS
    # venv\Scripts\activate  # Windows
  3. Install Dependencies:

    pip install -r requirements.txt

    (Note: If requirements.txt doesn't exist, you'll need to create one based on the project's imports.)

  4. Set API Keys:

    export OPENAI_API_KEY='your-actual-openai-api-key'
    export GROQ_API_KEY='your-actual-groq-api-key'

    (Consider using environment variables or a .env file for better security.)

Configuration

Configuration is managed via Hydra. Key configuration files are typically located in a cfg/ directory:

  • cfg/config.yaml: Main configuration file linking others.
  • cfg/training/: Training parameters (algorithm hyperparameters, total timesteps).
  • cfg/env/: Environment settings (environment ID).
  • cfg/llm/: LLM parameters (model names for OpenAI/Groq, prompts).
  • cfg/experiment/: Experiment settings (number of iterations, logging paths).
  • cfg/paths/: Defines paths for outputs, models, logs.

You can override parameters via the command line:

python gen.py training.total_timesteps=50000 llm.model="llama3-70b-8192" llm.provider="groq"

Usage

Iterative Training with LLM (gen.py)

The main script (gen.py) orchestrates the iterative process:

  1. Run with Default Configuration:

    # Ensure correct environment is set via defaults or command line
    python gen.py env=go1 # Or env=fetchReach, etc.
  2. Run with Overrides:

    python gen.py experiment.iterations=5 training.learning_rate=1e-4 llm.provider=openai llm.model=gpt-4

The script typically performs the following loop for the specified number of iterations:

  • Generates/Refines a reward function using the LLM (OpenAI or Groq).
  • Updates the environment wrapper or reward logic.
  • Trains the SAC agent using the current reward function.
  • Saves the trained model, logs, and generated reward function.
  • Potentially uses results from the training run to inform the next LLM generation step.

Directory Structure (Example)

.
├── cfg/                    # Hydra configuration files
│   ├── config.yaml         # Main config
│   ├── training/
│   ├── env/
│   ├── llm/
│   ├── experiment/
│   └── paths/
├── envs/                   # Custom environment wrappers (if any)
├── models/                 # Saved RL models per iteration
├── results/                # Logs, generated rewards, plots
│   └── reach/              # Example output for 'reach' task
│       ├── iteration_1/
│       ├── iteration_2/
│       └── ...
├── utils/                  # Utility functions (e.g., LLM interaction)
├── gen.py                  # Main script
├── requirements.txt        # Project dependencies
└── README.md               # This file

Output

During execution, the project generates:

  • Saved Models: RL agent models saved periodically or at the end of each iteration (in models/ or results/.../models/).
  • TensorBoard Logs: Training metrics logged for TensorBoard (in results/.../tensorboard_logs/).
  • Generated Rewards: Text files or logs containing the prompts and the reward functions generated by the LLM for each iteration (in results/.../rewards/).
  • Console Logs: Output detailing the progress of each iteration, training steps, and LLM interactions.

Monitoring Training

Launch TensorBoard to view learning curves and other metrics:

# Adjust the logdir path based on your output structure
tensorboard --logdir results/reach/tensorboard_logs

Navigate to http://localhost:6006 in your browser.

Acknowledgments

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages