Skip to content

CODEPUSH-3: Add config gen#8

Open
bharathappali wants to merge 3 commits into
kruize:mainfrom
bharathappali:add-config-gen
Open

CODEPUSH-3: Add config gen#8
bharathappali wants to merge 3 commits into
kruize:mainfrom
bharathappali:add-config-gen

Conversation

@bharathappali
Copy link
Copy Markdown
Member

@bharathappali bharathappali commented Apr 23, 2026

This PR is built on top of #7

#7 Needs to be merged before merging this PR

This PR adds the config generator which generates the hierarchy mapping needed for data generator. Config will have the metadata like namespace details, deployment info, pod info and container details along with the nature of the workload

Summary by Sourcery

Add a configurable workload metadata generator that produces hierarchical cluster/namespace/workload configuration files for use by the data generator.

New Features:

  • Introduce a command-line script to generate meta.json configs describing clusters, namespaces, workloads, pods, and containers over a specified time range.
  • Add support for modeling CPU, memory, and optional GPU resource utilization and capacities per node and container when generating configs.
  • Create the metadata mapping file scaffold used to derive clusters, namespaces, and workloads for configuration generation.

Signed-off-by: bharathappali <abharath@redhat.com>
Signed-off-by: bharathappali <abharath@redhat.com>
Signed-off-by: bharathappali <abharath@redhat.com>
@bharathappali bharathappali requested a review from chandrams April 23, 2026 08:29
@bharathappali bharathappali self-assigned this Apr 23, 2026
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Apr 23, 2026

Reviewer's Guide

Introduce a standalone configuration generator script that creates metadata for synthetic workload hierarchies (clusters → namespaces → workloads → pods → containers) using a mapping.json file and various randomization and resource-constraint rules, and writes the result to data/configs//meta.json.

ER diagram for generated meta_json workload hierarchy

erDiagram
    CONFIG {
        int start_time
        int end_time
        int interval_secs
        int num_clusters
        int num_namespaces
    }

    CLUSTER {
        string name
    }

    NAMESPACE {
        string name
    }

    WORKLOAD {
        string name
        boolean is_deployment
        boolean is_gpu_workload
        int num_replicas
        int num_containers
        string cpu_utilization
        string memory_utilization
        string gpu_utilization
    }

    CONTAINER {
        string name
        string image
        string image_id
        float cpu_min
        float cpu_max
        string cpu_utilization
        float memory_min
        float memory_max
        string memory_utilization
        float gpu_min
        float gpu_max
        string gpu_utilization
    }

    POD {
        string name
        string id
        string node_name
        string gpu_uuid
        string gpu_device
        string gpu_model
    }

    POD_CONTAINER_STATE {
        string container_name
        string container_id
        float current_cpu_seconds
        float current_throttled_seconds
    }

    NODE {
        string name
        int capacity_cpu
        int capacity_memory
        string capacity_gpu_type
        int capacity_gpu_count
    }

    NODE_AVAILABILITY {
        int available_cpu
        int available_memory
        string available_gpu_type
        int available_gpu_count
        boolean is_assigned
    }

    CONFIG ||--o{ CLUSTER : has
    CLUSTER ||--o{ NAMESPACE : contains
    NAMESPACE ||--o{ WORKLOAD : owns
    WORKLOAD ||--o{ POD : replicates
    WORKLOAD ||--o{ CONTAINER : defines

    POD ||--o{ POD_CONTAINER_STATE : tracks
    CONTAINER ||--o{ POD_CONTAINER_STATE : used_in

    NODE ||--|| NODE_AVAILABILITY : has
    NODE ||--o{ POD : hosts
Loading

File-Level Changes

Change Details Files
Add a CLI-based config generator that builds hierarchical workload metadata and writes meta.json under a named config directory.
  • Implement argument parsing for config name, namespace/deployment/replica counts, pre/post day ranges, and time interval, using defaults from Constants.
  • Add time-range helpers to convert pre/post day windows and interval strings into UNIX timestamp ranges and sequences.
  • Validate and normalize user inputs for config name and temporal parameters before generation, printing the number of generated timestamps.
  • Create per-config output directories under a common configs root, overwriting existing contents if necessary.
generate_config.py
Introduce a synthetic node inventory and node-selection logic to assign pods to nodes with CPU, memory, and optional GPU capacity tracking.
  • Pre-generate a large NODE_MAP of worker nodes with random IP-like names, CPU/memory capacities, and optional GPU types/counts.
  • Implement get_node to pick an unassigned node that has sufficient CPU/memory (and GPUs for GPU workloads), decrementing its available resources and tracking assignment.
  • Model GPU-specific details such as GPU device ID, model, and UUID for GPU workloads within pod state metadata.
generate_config.py
Generate hierarchical cluster/namespace/workload/pod/container metadata based on a world mapping and resource-utilization constraints.
  • Load cluster-to-namespace-to-workload mappings from mapping.json and randomly choose a subset of namespaces and workloads according to requested counts.
  • For each selected workload, randomly decide deployment vs non-deployment, GPU vs non-GPU (for deployments), replicas, container count, and utilization levels for CPU, memory, and GPU.
  • Derive min/max resource ranges from Constants.ResourceConstraints based on utilization categories and attach them to each container’s resource spec.
  • Generate pod names and container names deterministically patterned off workloads, assign pods to nodes via get_node, and attach IDs, image IDs, image references, and initial resource counters.
  • Assemble the full meta structure with cluster, namespace, workload, pod, and container metadata plus time_range, and write it as pretty-printed JSON to meta.json in the config directory.
generate_config.py
data/metadata/mapping.json

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@bharathappali bharathappali changed the title Add config gen CODEPUSH-3: Add config gen Apr 23, 2026
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 6 issues, and left some high level feedback:

  • Building the large NODE_MAP with random data at import time makes this module slow and side‑effectful when imported; consider moving that initialization into main() or a dedicated initializer function so it only runs when needed.
  • load_metadata() can return None, but generate_meta_json() assumes mapping_data is a valid dict and will fail if the metadata file is missing or invalid; add a defensive check and exit or raise a clear error when metadata cannot be loaded.
  • The generated timestamps list in main() is only used to print its length and is otherwise unused; either remove this computation or wire the timestamps into the config/meta generation so the work has a clear purpose.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Building the large `NODE_MAP` with random data at import time makes this module slow and side‑effectful when imported; consider moving that initialization into `main()` or a dedicated initializer function so it only runs when needed.
- `load_metadata()` can return `None`, but `generate_meta_json()` assumes `mapping_data` is a valid dict and will fail if the metadata file is missing or invalid; add a defensive check and exit or raise a clear error when metadata cannot be loaded.
- The generated `timestamps` list in `main()` is only used to print its length and is otherwise unused; either remove this computation or wire the timestamps into the config/meta generation so the work has a clear purpose.

## Individual Comments

### Comment 1
<location path="generate_config.py" line_range="70-79" />
<code_context>
+def get_node(is_gpu_workload, max_cpu, max_memory):
</code_context>
<issue_to_address>
**issue (bug_risk):** Node assignment logic only allows a single workload per node due to `is_assigned`, effectively ignoring remaining CPU/memory capacity.

`get_node` sets `node_info["is_assigned"] = True` on the first scheduled workload, so later calls skip that node even if `availability["cpu"]` and `availability["memory"]` still allow more workloads. If nodes are meant to host multiple workloads until capacity is exhausted, either remove `is_assigned` and rely solely on the availability checks, or reserve `is_assigned` for a true terminal state (e.g., drained node). Otherwise, the availability tracking becomes misleading and nodes are systematically underutilized.
</issue_to_address>

### Comment 2
<location path="generate_config.py" line_range="131-140" />
<code_context>
+    return parser.parse_args()
+
+
+def get_num_secs(secs: str):
+    if secs is None:
+        return 30
+    if secs == "":
+        return 30
+    secs = secs.strip()
+    if secs not in Constants.INTERVAL_CHOICES:
+        return 30
+    if secs == "1s":
+        return 1
+    if secs == "5s":
+        return 5
+    if secs == "15s":
+        return 15
+    if secs == "30s":
+        return 30
+    if secs == "60s":
+        return 60
+
</code_context>
<issue_to_address>
**issue (bug_risk):** get_num_secs can fall through without returning a value if new interval choices are added in the future.

Because the function only handles a fixed set of values (`1s`, `5s`, `15s`, `30s`, `60s`), any future value added to `Constants.INTERVAL_CHOICES` (e.g. `"300s"`) could pass the membership check but not match any branch, causing an implicit `None` return and later issues in `get_timestamps`. Please either switch to a mapping dict (e.g. `{'1s': 1, '5s': 5, ...}`) or add an explicit default `return` at the end to guarantee a numeric result for all valid choices.
</issue_to_address>

### Comment 3
<location path="generate_config.py" line_range="151-156" />
<code_context>
+        return 60
+
+
+def get_start_end_unix_seconds(pre_days: int, post_days: int):
+    current_time = datetime.now(UTC).replace(second=0, microsecond=0)
+    start_time = current_time - timedelta(days=pre_days)
+    end_time = current_time + timedelta(days=post_days)
+    start_unix_seconds = int(time.mktime(start_time.timetuple()))
+    end_unix_seconds = int(time.mktime(end_time.timetuple()))
+    return start_unix_seconds, end_unix_seconds
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Using time.mktime with UTC datetimes can skew timestamps on systems with non-UTC local time.

`datetime.now(UTC)` returns UTC-aware datetimes, but `time.mktime()` treats the timetuple as local time. On non-UTC hosts this will offset the epoch value by the local timezone. To get correct UTC epoch seconds, use `calendar.timegm(dt.utctimetuple())` or `int(dt.timestamp())` on the aware UTC datetime instead.
</issue_to_address>

### Comment 4
<location path="generate_config.py" line_range="183-162" />
<code_context>
+    return return_path
+
+
+def load_metadata():
+    try:
+        with open(MAPPING_JSON, 'r') as f:
+            metadata = json.load(f)
+        return metadata
+    except FileNotFoundError:
+        print(f"Error: File '{MAPPING_JSON}' not found.")
+        return None
+    except json.JSONDecodeError as e:
+        print(f"Error decoding JSON from '{MAPPING_JSON}': {str(e)}")
</code_context>
<issue_to_address>
**issue (bug_risk):** Returning None from load_metadata leads to runtime errors in generate_meta_json when the mapping file is missing or invalid.

`main()` passes the result of `load_metadata` directly as `mapping_data` into `generate_meta_json`, which immediately calls `mapping_data.values()`. When the mapping file is missing or invalid, `load_metadata` returns `None`, causing an exception in `generate_meta_json` with an unhelpful error. Either raise a clear exception from `load_metadata` instead of returning `None`, or have `main()` validate the result and fail fast with a clearer message before calling `generate_meta_json`.
</issue_to_address>

### Comment 5
<location path="generate_config.py" line_range="332-340" />
<code_context>
+                    pod = generate_pod_name(deployment_name=workload, is_deployment=is_deployment, replica_index=i)
+                    pod_id = str(uuid.uuid4())
+                    pods.append(pod)
+                    node_name, available_gpu_type = get_node(is_gpu_workload, max_cpu, max_memory)
+                    pod_states[pod] = {
+                        "id": pod_id,
+                        "node": node_name
+                    }
+                    if is_gpu_workload:
+                        pod_states[pod]['gpu_uuid'] = str(uuid.uuid4())
+                        pod_states[pod]['gpu_device'] = "nvidia0"
+                        pod_states[pod]['gpu_model'] = available_gpu_type
+                    for container_name in container_list:
+                        pod_states[pod][container_name] = {
</code_context>
<issue_to_address>
**issue:** No handling for failure to allocate a node leads to pods with `node=None` and GPU workloads with `gpu_model=None`.

`get_node` may return `(None, None)` when no node is available, but the code still records `"node": None` and, for GPU workloads, sets `gpu_uuid`, `gpu_device`, and `gpu_model=None`. This creates pods with GPU metadata that are not actually scheduled on any node. Please handle the `node_name is None` case explicitly (e.g., skip pod creation, mark as unschedulable, or raise) and only populate GPU metadata when a node is successfully allocated.
</issue_to_address>

### Comment 6
<location path="generate_config.py" line_range="399-403" />
<code_context>
+    interval_secs = get_num_secs(interval)
+    config_name = args.config_name
+
+    if not re.match(r"^[a-zA-Z0-9_-]+$", config_name):
+        raise ValueError(
+            "Invalid characters in config_name. Only alphanumeric, hyphen (-), and underscore (_) are allowed.")
+
+    if '/' in config_name:
+        raise ValueError("config_name cannot contain '/' character.")
+
</code_context>
<issue_to_address>
**nitpick:** The explicit '/' check for config_name is redundant with the regex and could be folded into one validation step.

Because `^[a-zA-Z0-9_-]+$` already disallows `/`, the `if '/' in config_name` branch is effectively unreachable when the regex check runs first. To keep distinct error messages, either check for `/` before applying the regex, or remove the second check and rely on the regex error. This simplifies the validation logic and avoids dead code paths.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread generate_config.py
Comment on lines +70 to +79
def get_node(is_gpu_workload, max_cpu, max_memory):
for node_name, node_info in NODE_MAP.items():
if not node_info["is_assigned"]:
if (node_info["availability"]["cpu"] >= max_cpu and
node_info["availability"]["memory"] >= max_memory):
if is_gpu_workload:
if node_info["availability"]["gpu"] and node_info["availability"]["gpu"]["count"] > 0:
node_info["availability"]["cpu"] -= max_cpu
node_info["availability"]["memory"] -= max_memory
node_info["availability"]["gpu"]["count"] -= 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Node assignment logic only allows a single workload per node due to is_assigned, effectively ignoring remaining CPU/memory capacity.

get_node sets node_info["is_assigned"] = True on the first scheduled workload, so later calls skip that node even if availability["cpu"] and availability["memory"] still allow more workloads. If nodes are meant to host multiple workloads until capacity is exhausted, either remove is_assigned and rely solely on the availability checks, or reserve is_assigned for a true terminal state (e.g., drained node). Otherwise, the availability tracking becomes misleading and nodes are systematically underutilized.

Comment thread generate_config.py
Comment on lines +131 to +140
def get_num_secs(secs: str):
if secs is None:
return 30
if secs == "":
return 30
secs = secs.strip()
if secs not in Constants.INTERVAL_CHOICES:
return 30
if secs == "1s":
return 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): get_num_secs can fall through without returning a value if new interval choices are added in the future.

Because the function only handles a fixed set of values (1s, 5s, 15s, 30s, 60s), any future value added to Constants.INTERVAL_CHOICES (e.g. "300s") could pass the membership check but not match any branch, causing an implicit None return and later issues in get_timestamps. Please either switch to a mapping dict (e.g. {'1s': 1, '5s': 5, ...}) or add an explicit default return at the end to guarantee a numeric result for all valid choices.

Comment thread generate_config.py
Comment on lines +151 to +156
def get_start_end_unix_seconds(pre_days: int, post_days: int):
current_time = datetime.now(UTC).replace(second=0, microsecond=0)
start_time = current_time - timedelta(days=pre_days)
end_time = current_time + timedelta(days=post_days)
start_unix_seconds = int(time.mktime(start_time.timetuple()))
end_unix_seconds = int(time.mktime(end_time.timetuple()))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Using time.mktime with UTC datetimes can skew timestamps on systems with non-UTC local time.

datetime.now(UTC) returns UTC-aware datetimes, but time.mktime() treats the timetuple as local time. On non-UTC hosts this will offset the epoch value by the local timezone. To get correct UTC epoch seconds, use calendar.timegm(dt.utctimetuple()) or int(dt.timestamp()) on the aware UTC datetime instead.

Comment thread generate_config.py

def get_timestamps(from_ts: int, to_ts: int, interval_secs: int):
if from_ts <= 0 or to_ts <= 0 or interval_secs <= 0 or from_ts >= to_ts:
return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Returning None from load_metadata leads to runtime errors in generate_meta_json when the mapping file is missing or invalid.

main() passes the result of load_metadata directly as mapping_data into generate_meta_json, which immediately calls mapping_data.values(). When the mapping file is missing or invalid, load_metadata returns None, causing an exception in generate_meta_json with an unhelpful error. Either raise a clear exception from load_metadata instead of returning None, or have main() validate the result and fail fast with a clearer message before calling generate_meta_json.

Comment thread generate_config.py
Comment on lines +332 to +340
node_name, available_gpu_type = get_node(is_gpu_workload, max_cpu, max_memory)
pod_states[pod] = {
"id": pod_id,
"node": node_name
}
if is_gpu_workload:
pod_states[pod]['gpu_uuid'] = str(uuid.uuid4())
pod_states[pod]['gpu_device'] = "nvidia0"
pod_states[pod]['gpu_model'] = available_gpu_type
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: No handling for failure to allocate a node leads to pods with node=None and GPU workloads with gpu_model=None.

get_node may return (None, None) when no node is available, but the code still records "node": None and, for GPU workloads, sets gpu_uuid, gpu_device, and gpu_model=None. This creates pods with GPU metadata that are not actually scheduled on any node. Please handle the node_name is None case explicitly (e.g., skip pod creation, mark as unschedulable, or raise) and only populate GPU metadata when a node is successfully allocated.

Comment thread generate_config.py
Comment on lines +399 to +403
if not re.match(r"^[a-zA-Z0-9_-]+$", config_name):
raise ValueError(
"Invalid characters in config_name. Only alphanumeric, hyphen (-), and underscore (_) are allowed.")

if '/' in config_name:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: The explicit '/' check for config_name is redundant with the regex and could be folded into one validation step.

Because ^[a-zA-Z0-9_-]+$ already disallows /, the if '/' in config_name branch is effectively unreachable when the regex check runs first. To keep distinct error messages, either check for / before applying the regex, or remove the second check and rely on the regex error. This simplifies the validation logic and avoids dead code paths.

Comment thread generate_config.py

NODE_MAP = {}

for node_index in range(NUM_NODES):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add description into this file. Also rebase the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants