Skip to content

Don't serialize dcp objects#43

Merged
YarnSaw merged 5 commits into
mainfrom
bugfix/dont-serialize-dcp-objs
Apr 21, 2026
Merged

Don't serialize dcp objects#43
YarnSaw merged 5 commits into
mainfrom
bugfix/dont-serialize-dcp-objs

Conversation

@JosephAcernese
Copy link
Copy Markdown
Contributor

There are some minor errors in the compute_for and job wrappers. A particular one is that self.js_ref.jobInputData.js_ref should always returns None but it is being used in the conditional statements for serialization.

Nested objects should never need to be unwrapped multiple times in the call chain as the underlying JS objects should not be wrapped

  • job is the bifrost2 wrapper for job
  • job.jobInputData is the bifrost2 wrapper for jobInputData
  • job.js_ref is the JS proxy for job
  • job.js_ref.jobInputData AND job.jobInputData.js_ref are the same JS proxy for jobInputData
  • job.js_ref.jobInputData.js_ref exists technically but direct access of an undefined attribute in JS returns Undefined, which pythonmonkey then converts to return None

This PR also fixes the compute_for function not unwrapping potential DCP objects, and adds an example of remote data jobs since it fixes it

@wiwichips
Copy link
Copy Markdown
Collaborator

It would be really really really valuable to have CI tests which ensure job deploy succeeds, the work function executed inside a worker gets the correct slice data / arguments, and results are processed correctly.

However that's out of the scope of this PR so I'm just leaving it as a general comment #44

Comment thread dcp/api/job.py
Comment thread dcp/api/compute_for.py
@wiwichips
Copy link
Copy Markdown
Collaborator

LGTM @JosephAcernese -- please get a review+approval from one of the Distributive folks before merging

Comment thread examples/remote-data-job-deploy.py
@YarnSaw
Copy link
Copy Markdown
Contributor

YarnSaw commented Apr 21, 2026

LGTM!

@YarnSaw YarnSaw merged commit f776f1f into main Apr 21, 2026
5 checks passed
Comment thread dcp/api/job.py
serialized_input_data = self.js_ref.jobInputData
if hasattr(self.jobArguments, 'js_ref') and dry.class_manager.reg.find_from_js_instance(self.jobArguments.js_ref):
serialized_arguments = self.jobArguments.js_ref
serialized_arguments = [self.jobArguments.js_ref]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks incorrect to me, doesn't this make serialized_arguments an array of array-likes when it should just be an array-like?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary because line 143 on this file expects serialzed_arguments to be concatenated with other PYTHON lists

self.js_ref.jobArguments = [offset_to_argument_vector] + ["gzImage", job_fs] + env_args + serialized_arguments + [meta_arguments]

This solution did seem a little fishy to me in some way, but I was getting the correct behaviour with single objects (such as RemoteDataSet) properly converting to single and/or multiple job args

There might be other cleaner more intuitive solutions than just slapping it in an array as well, my justification for this though was that the other branch of logic for job args always leaves serialized_arguments as an array

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this would work, for ex. [range(1,10)] is a list containing a range, it doesn't flatten the iterable

Also we shouldn't be using + to concatenate iterables since it only works for concatenating lists, but that's maybe out of scope of this PR and we have to deal with the different iterator interfaces between js and python 🥴

Maybe we should instead use itertools.chain, and write a small iterator wrapper which converts a js iterator into a python iterable if it's a js_ref and it has Symbol.iterator attribute

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants