Skip to content

Update fieldset ingestion to use convert modules#40

Open
VeckoTheGecko wants to merge 11 commits intomainfrom
vecko-update
Open

Update fieldset ingestion to use convert modules#40
VeckoTheGecko wants to merge 11 commits intomainfrom
vecko-update

Conversation

@VeckoTheGecko
Copy link
Copy Markdown
Contributor

@VeckoTheGecko VeckoTheGecko commented Mar 18, 2026

OK - I've gone ahead and updated the ingestion code here so that its inline with Parcels-code/Parcels#2549 . We are closer to having a working benchmark suite, but unfortunately we're not there yet. Hence I propose that we go ahead and merge this anyway as it gets us closer to the end goal.

MOI error

Currently ingestion works, but we get an error during the execution itself (note this PR now closes #33, as ingestion works and we now have a different error).

pixi run setup-data
pixi run asv run --bench moi_curvilinear.MOICurvilinear.time_pset_execute_3d
· Creating environments
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[ 0.00%] · For parcels commit be625b01 <update-convert>:
[ 0.00%] ·· Benchmarking rattler-py3.12-intake-xarray
[50.00%] ··· Running (moi_curvilinear.MOICurvilinear.time_pset_execute_3d--).
[100.00%] ··· ...vilinear.MOICurvilinear.time_pset_execute_3d             failed
[100.00%] ··· ============== =============
              --             chunk / npart
              -------------- -------------
               interpolator   256 / 10000 
              ============== =============
                 XLinear         failed   
              ============== =============
              For parameters: 'XLinear', 256, 10000
              /Users/Hodgs004/coding/repos/parcels-benchmarks/benchmarks/moi_curvilinear.py:2: UserWarning: This is an alpha version of Parcels v4. The API is not stable and may change without deprecation warnings.
                import parcels
              /Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/parcels/convert.py:126: UserWarning: No depth dimension found in your dataset. Assuming no depth (i.e., surface data).
                warnings.warn("No depth dimension found in your dataset. Assuming no depth (i.e., surface data).", stacklevel=1)
              Traceback (most recent call last):
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.pixi/envs/default/lib/python3.12/site-packages/asv/benchmark.py", line 99, in <module>
                  main()
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.pixi/envs/default/lib/python3.12/site-packages/asv/benchmark.py", line 91, in main
                  commands[mode](args)
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/asv_runner/run.py", line 72, in _run
                  result = benchmark.do_run()
                           ^^^^^^^^^^^^^^^^^^
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/asv_runner/benchmarks/_base.py", line 661, in do_run
                  return self.run(*self._current_params)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/asv_runner/benchmarks/time.py", line 165, in run
                  samples, number = self.benchmark_timing(
                                    ^^^^^^^^^^^^^^^^^^^^^^
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/asv_runner/benchmarks/time.py", line 258, in benchmark_timing
                  timing = timer.timeit(number)
                           ^^^^^^^^^^^^^^^^^^^^
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/timeit.py", line 180, in timeit
                  timing = self.inner(it, self.timer)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "<timeit-src>", line 6, in inner
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/asv_runner/benchmarks/time.py", line 90, in func
                  self.func(*param)
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/benchmarks/moi_curvilinear.py", line 76, in time_pset_execute_3d
                  self.pset_execute_3d(interpolator, chunk, npart)
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/benchmarks/moi_curvilinear.py", line 71, in pset_execute_3d
                  pset.execute(
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/parcels/_core/particleset.py", line 435, in execute
                  self._kernel.execute(self, endtime=next_time, dt=dt)
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/parcels/_core/kernel.py", line 245, in execute
                  error_func(pset[inds].z, pset[inds].lat, pset[inds].lon)
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/parcels/_core/statuscodes.py", line 44, in _raise_field_interpolation_error
                  raise FieldInterpolationError(f"Field interpolation returned NaN at (z={z}, lat={y}, lon={x})")
              parcels._core.statuscodes.FieldInterpolationError: Field interpolation returned NaN at (z=array([0., 0., 0., ..., 0., 0., 0.], shape=(10000,), dtype=float32), lat=array([-30.   , -29.999, -29.998, ..., -20.002, -20.001, -20.   ],
                    shape=(10000,), dtype=float32), lon=array([-10.      ,  -9.998   ,  -9.995999, ...,   9.995999,   9.998   ,
                      10.      ], shape=(10000,), dtype=float32))

FESOM error

Here we get an error on the selection of the interpolator - this is a bug upstream in Parcels (this dataset has dims ('time', 'nz1', 'elem', 'nod2', 'nz') but _select_uxinterpolator doesn't expect these dimension namings isn't able to determine the right interpolators. AFAICT this problem was always here for this dataset. Let me know what you think @fluidnumerics-joe ).

pixi run setup-data
pixi run asv run --bench 'fesom2.*'
· Creating environments
· Discovering benchmarks
· Running 3 total benchmarks (1 commits * 1 environments * 3 benchmarks)
[ 0.00%] · For parcels commit be625b01 <update-convert>:
[ 0.00%] ·· Benchmarking rattler-py3.12-intake-xarray
[33.33%] ··· Running (fesom2.FESOM2.time_load_data--)..
[66.67%] ··· fesom2.FESOM2.peakmem_pset_execute                          failed
[66.67%] ··· ======= ============================
             --               integrator         
             ------- ----------------------------
              npart   <function AdvectionRK2_3D> 
             ======= ============================
              10000             failed           
             ======= ============================
             For parameters: 10000, <function AdvectionRK2_3D>
             /Users/Hodgs004/coding/repos/parcels-benchmarks/benchmarks/fesom2.py:4: UserWarning: This is an alpha version of Parcels v4. The API is not stable and may change without deprecation warnings.
               from parcels import (
             INFO: Using known vertical dimension mapping: 'nz' (interfaces) and 'nz1' (centers).
             INFO: Renaming vertical dimensions: {'nz': 'zf', 'nz1': 'zc'}
             INFO: cf_xarray found variable 'w' with CF standard name 'w' in dataset, renamed it to 'W' for Parcels simulation.
             INFO: cf_xarray found variable 'unod' with CF standard name 'unod' in dataset, renamed it to 'U' for Parcels simulation.
             INFO: cf_xarray found variable 'vnod' with CF standard name 'vnod' in dataset, renamed it to 'V' for Parcels simulation.
             Traceback (most recent call last):
               File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.pixi/envs/default/lib/python3.12/site-packages/asv/benchmark.py", line 99, in <module>
                 main()
               File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.pixi/envs/default/lib/python3.12/site-packages/asv/benchmark.py", line 91, in main
                 commands[mode](args)
               File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/asv_runner/run.py", line 72, in _run
                 result = benchmark.do_run()
                          ^^^^^^^^^^^^^^^^^^
               File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/asv_runner/benchmarks/_base.py", line 661, in do_run
                 return self.run(*self._current_params)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/asv_runner/benchmarks/peakmem.py", line 66, in run
                 self.func(*param)
               File "/Users/Hodgs004/coding/repos/parcels-benchmarks/benchmarks/fesom2.py", line 57, in peakmem_pset_execute
                 self.pset_execute(npart, integrator)
               File "/Users/Hodgs004/coding/repos/parcels-benchmarks/benchmarks/fesom2.py", line 45, in pset_execute
                 fieldset = FieldSet.from_ugrid_conventions(ds)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/parcels/_core/fieldset.py", line 215, in from_ugrid_conventions
                 fields["U"] = Field("U", ds["U"], grid, _select_uxinterpolator(ds["U"]))
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/parcels/_core/field.py", line 132, in __init__
                 assert_same_function_signature(interp_method, ref=ZeroInterpolator, context="Interpolation")
               File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/parcels/_python.py", line 26, in assert_same_function_signature
                 sig = inspect.signature(f)
                       ^^^^^^^^^^^^^^^^^^^^
               File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/inspect.py", line 3348, in signature
                 return Signature.from_callable(obj, follow_wrapped=follow_wrapped,
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/inspect.py", line 3085, in from_callable
                 return _signature_from_callable(obj, sigcls=cls,
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/inspect.py", line 2522, in _signature_from_callable
                 raise TypeError('{!r} is not a callable object'.format(obj))
             TypeError: None is not a callable object

[83.33%] ··· fesom2.FESOM2.time_load_data                                    ok
[83.33%] ··· ======= ============================
             --               integrator         
             ------- ----------------------------
              npart   <function AdvectionRK2_3D> 
             ======= ============================
              10000           96.6±0.9ms         
             ======= ============================

[100.00%] ··· fesom2.FESOM2.time_pset_execute                             failed
[100.00%] ··· ======= ============================
              --               integrator         
              ------- ----------------------------
               npart   <function AdvectionRK2_3D> 
              ======= ============================
               10000             failed           
              ======= ============================
              For parameters: 10000, <function AdvectionRK2_3D>
              /Users/Hodgs004/coding/repos/parcels-benchmarks/benchmarks/fesom2.py:4: UserWarning: This is an alpha version of Parcels v4. The API is not stable and may change without deprecation warnings.
                from parcels import (
              INFO: Using known vertical dimension mapping: 'nz' (interfaces) and 'nz1' (centers).
              INFO: Renaming vertical dimensions: {'nz': 'zf', 'nz1': 'zc'}
              INFO: cf_xarray found variable 'w' with CF standard name 'w' in dataset, renamed it to 'W' for Parcels simulation.
              INFO: cf_xarray found variable 'unod' with CF standard name 'unod' in dataset, renamed it to 'U' for Parcels simulation.
              INFO: cf_xarray found variable 'vnod' with CF standard name 'vnod' in dataset, renamed it to 'V' for Parcels simulation.
              Traceback (most recent call last):
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.pixi/envs/default/lib/python3.12/site-packages/asv/benchmark.py", line 99, in <module>
                  main()
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.pixi/envs/default/lib/python3.12/site-packages/asv/benchmark.py", line 91, in main
                  commands[mode](args)
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/asv_runner/run.py", line 72, in _run
                  result = benchmark.do_run()
                           ^^^^^^^^^^^^^^^^^^
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/asv_runner/benchmarks/_base.py", line 661, in do_run
                  return self.run(*self._current_params)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/asv_runner/benchmarks/time.py", line 165, in run
                  samples, number = self.benchmark_timing(
                                    ^^^^^^^^^^^^^^^^^^^^^^
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/asv_runner/benchmarks/time.py", line 258, in benchmark_timing
                  timing = timer.timeit(number)
                           ^^^^^^^^^^^^^^^^^^^^
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/timeit.py", line 180, in timeit
                  timing = self.inner(it, self.timer)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "<timeit-src>", line 6, in inner
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/asv_runner/benchmarks/time.py", line 90, in func
                  self.func(*param)
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/benchmarks/fesom2.py", line 54, in time_pset_execute
                  self.pset_execute(npart, integrator)
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/benchmarks/fesom2.py", line 45, in pset_execute
                  fieldset = FieldSet.from_ugrid_conventions(ds)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/parcels/_core/fieldset.py", line 215, in from_ugrid_conventions
                  fields["U"] = Field("U", ds["U"], grid, _select_uxinterpolator(ds["U"]))
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/parcels/_core/field.py", line 132, in __init__
                  assert_same_function_signature(interp_method, ref=ZeroInterpolator, context="Interpolation")
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/site-packages/parcels/_python.py", line 26, in assert_same_function_signature
                  sig = inspect.signature(f)
                        ^^^^^^^^^^^^^^^^^^^^
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/inspect.py", line 3348, in signature
                  return Signature.from_callable(obj, follow_wrapped=follow_wrapped,
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/inspect.py", line 3085, in from_callable
                  return _signature_from_callable(obj, sigcls=cls,
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "/Users/Hodgs004/coding/repos/parcels-benchmarks/.asv/env/44a3d831dbbb05d5c17212900f2e92b0/lib/python3.12/inspect.py", line 2522, in _signature_from_callable
                  raise TypeError('{!r} is not a callable object'.format(obj))
              TypeError: None is not a callable object


Future work

Better testing

I'm finding it quite difficult to debug all of this since it's working using heavy datasets, and the iteration loop using asv run is very frustrating (e.g., run benchmark, find error (that I can't easily open using pdb since that has poor integration with asv), recreate the error using a normal python script, realise the bug is in Parcels, etc).

At the moment we have the following (which can be thought of as a pyramid - from the top to the most foundational):

  • This benchmarks repo
  • The datasets generated and used in Parcels

There is, however, the possibility of a layer in between:

  • This benchmarks repo
  • Datasets generated from real coordinates and metadata, but with fake array data.
    • Almost akin to generating datasets from CDL (i.e., ncdump output). We need to build a small amount of custom tooling around this since xarray doesn't provide it Adding CDL Parser/open_cdl? pydata/xarray#6269 and ds.to_dict(data=False) is close to what we need, but excludes coordiantes)
  • The datasets generated and used in Parcels

This intermediate layer is hinted at in our "Participating in the issue tracker: 'Parcels doesn't work with my data'" doc page section, but I think can be formalised and also extended to the coordinates (as those are also quite important). Regarding implementation, we can create a separate repo to host these small files (similar to https://github.com/Parcels-code/parcels-data )

This intermediate layer has the following benefits:

  • Improved testing of the "convert" module using realistic metadata
  • Lightweight (can be integrated into our main test suite)
  • Easy to debug

Keen to hear your thoughts @erikvansebille .

Better cataloguing

From https://discourse.pangeo.io/t/data-pipelining-and-cataloging-best-practices-using-intake-xarray-to-transform-and-combine-data-metadata/5550/6 , I think we can streamline how we ingest data (by using Intake 2 in combination with the convert module or in combination with uxarray). Honestly, this is a low priority - I'm happy with what we have at the moment.

The important this from my POV is the "Better testing" above as that will flag any errors with our convert module.

@VeckoTheGecko
Copy link
Copy Markdown
Contributor Author

Future work: Better testing

I'm going to get started setting the groundwork on this - keen to discuss if either of you have ideas so this can be further refined :)

@VeckoTheGecko
Copy link
Copy Markdown
Contributor Author

Here we get an error on the selection of the interpolator - this is a bug upstream in Parcels (this dataset has dims ('time', 'nz1', 'elem', 'nod2', 'nz') but _select_uxinterpolator doesn't expect these dimension namings isn't able to determine the right interpolators.

Joe mentions that we have a convert function for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

moi curvilinear pset_execute_2d and pset_execute_3d benchmark failure

1 participant