Summary
Interpreter.run() (treesearch/interpreter.py) limits the execution time of generated
node code ([exec] timeout in config.toml), but not its memory consumption. A timeout is
caught, the node is scored as buggy, and the search continues. Running out of memory is
different: a single generated node can use up all host RAM and freeze the whole machine.
Observed behavior
During a run on MovieLens20M (Windows 11), the generated code in iteration 3/5 trained
ItemKNN and BPR on about 10M interactions. The debug log ends right as the interpreter
launches the subprocess, with no error or timeout logged:
[2026/06/11 10:52:46] [INFO] isgsa.treesearch: Type checking passed!
[2026/06/11 10:52:46] [DEBUG] isgsa.interpreter: Writing code to agent file: ...\out\workspace\runfile.py
[2026/06/11 10:52:46] [DEBUG] isgsa.interpreter: Done.
Shortly after, the machine froze and had to be hard reset. The whole run was lost,
including the LLM API costs spent up to that point. This will mainly happen with the
larger built-in datasets, where naive generated code quickly needs more RAM than a
desktop machine has.
Expected behavior
A node that exceeds a memory budget should be handled like a node that exceeds the time
limit: terminate the subprocess, mark the node as buggy, and continue the search. Generated
code should not be able to take down the host machine.
Summary
Interpreter.run()(treesearch/interpreter.py) limits the execution time of generatednode code (
[exec] timeoutinconfig.toml), but not its memory consumption. A timeout iscaught, the node is scored as buggy, and the search continues. Running out of memory is
different: a single generated node can use up all host RAM and freeze the whole machine.
Observed behavior
During a run on MovieLens20M (Windows 11), the generated code in iteration 3/5 trained
ItemKNN and BPR on about 10M interactions. The debug log ends right as the interpreter
launches the subprocess, with no error or timeout logged:
Shortly after, the machine froze and had to be hard reset. The whole run was lost,
including the LLM API costs spent up to that point. This will mainly happen with the
larger built-in datasets, where naive generated code quickly needs more RAM than a
desktop machine has.
Expected behavior
A node that exceeds a memory budget should be handled like a node that exceeds the time
limit: terminate the subprocess, mark the node as buggy, and continue the search. Generated
code should not be able to take down the host machine.