Skip to content

multiple input modes#2

Merged
safabouguezzi merged 3 commits intomainfrom
feature/multiple-input-modes
Apr 13, 2026
Merged

multiple input modes#2
safabouguezzi merged 3 commits intomainfrom
feature/multiple-input-modes

Conversation

@safabouguezzi
Copy link
Copy Markdown
Collaborator

@safabouguezzi safabouguezzi commented Apr 13, 2026

Summary by CodeRabbit

  • New Features

    • Automatic detection of input data layout supporting multiple file organization formats.
    • Support for location/catchment-based identification in combined data files.
  • Documentation

    • Completely updated README with tool overview, input data specifications, and output format details.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

The pull request introduces automated input-mode detection at container startup, replacing manual configuration. It adds detect_input.py to auto-generate /in/input.json by inspecting file structure (supporting three input layouts), updates data loading to handle location-based grouping, documents the new functionality in the README, and modifies tool specifications and example inputs accordingly.

Changes

Cohort / File(s) Summary
Container Startup
Dockerfile, src/detect_input.py
Restructured container CMD to run detect_input.py before run.py, ensuring input auto-detection occurs at startup. New script inspects /in directory structure and auto-generates /in/input.json with column names, supporting three input modes (single combined file, per-location obs/sim directories, or two combined files with location column).
Input Configuration
in/input.json, in/discharge_5694.csv.metadata.json
Updated example input to reference CAMELS_simulations.csv and CAMELS_observations.csv, added location_column: catchment_id parameter, and removed example metadata file.
Data Loading & Processing
src/evaluation.py, src/run.py
Extended load_data() signature to accept optional location_column and restructured logic to support three auto-detected input modes: per-location files, combined files with location matching, and single combined file. Updated run script to extract and pass location_column from configuration.
Tool Definition & Documentation
src/tool.yml, README.md
Added optional location_column parameter to tool schema. Replaced README with product-oriented documentation describing auto-detection, three input modes with matching rules, column-name defaults, metric outputs (NSE, KGE, R², MSE, RMSE), and references to CAMELS-DE and Hy2DL model.

Sequence Diagram(s)

sequenceDiagram
    participant Container
    participant detect_input.py
    participant FileSystem as File System
    participant run.py
    participant load_data as evaluation.load_data()
    
    Container->>detect_input.py: Execute (startup)
    detect_input.py->>FileSystem: Inspect /in for structure
    FileSystem-->>detect_input.py: File list & structure
    
    alt Mode 1: obs/ & sim/ subdirs
        detect_input.py->>FileSystem: Read obs & sim files
        FileSystem-->>detect_input.py: Column names
    else Mode 2: Two combined files
        detect_input.py->>FileSystem: Read first two files
        FileSystem-->>detect_input.py: Column names & content
    else Mode 0: Single combined file
        detect_input.py->>FileSystem: Read single file
        FileSystem-->>detect_input.py: Column names
    end
    
    detect_input.py->>FileSystem: Write /in/input.json
    detect_input.py->>Container: Print JSON
    
    Container->>run.py: Execute (after detect_input.py)
    run.py->>FileSystem: Read input.json config
    FileSystem-->>run.py: Parameters & paths
    
    run.py->>load_data: Call with detected config
    load_data->>FileSystem: Load observation/simulation data
    FileSystem-->>load_data: DataFrames
    
    alt Mode 1/2: Per-location or matched files
        load_data->>load_data: Group & merge by location
    else Mode 0: Single combined file
        load_data->>load_data: Extract obs/sim columns
    end
    
    load_data-->>run.py: Dict[location, DataFrame]
    run.py->>run.py: Compute metrics & generate report
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Files hop into place with auto-detect flair,
Three input modes handled with algorithmic care,
Location columns merge where data flows,
From startup to metrics—the detection grows!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'multiple input modes' directly and clearly summarizes the main objective of the changeset, which introduces support for three auto-detected input modes for loading simulation and observation data.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/multiple-input-modes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@safabouguezzi safabouguezzi self-assigned this Apr 13, 2026
@safabouguezzi safabouguezzi merged commit a0b6e06 into main Apr 13, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants