Skip to content

GDAL problems w use_conda() and reticulate #302

@eeholmes

Description

@eeholmes

The conda and R environments have different GDAL---because R geospatial installed with Rocker install_geospatial(), because they maintain that and getting all the GDAL linkages right is work and I want to use the Rocker teams work on that. Conversely I want the Python side to take care of its own GDAL linkages.

TLDR; using py_require() with reticulate is by far the more stable choice as it links to the GDAL in the R environment not the conda notebook environment.

If you are using use_conda(), need a clean R after doing anything that would link gdal, e.g. terra::rast(, vsi=TRUE), or after any reticulate commands, eg use_conda(), that would link to conda's gdal. Using py_require() is ok because that will use R's gdal environment.

How to get a clean R session?

  1. Don't use use_conda() only use py_require() and install all the packages you need.
  2. If you need to use use_conda() then in RStudio, restart R, do you stuff and restart R again (lose all vars). In JupyterLab, same idea but restart kernel. Obviously not something to do in a script that runs top to bottom.
  3. In a terminal, Run R, via R, do conda/Python stuff, save output and close. Again not so useful for scripts that run.
  4. In your R script (or Quarto file), use callr::r() to start a clean R session, do your conda stuff, save to file, or process to R object, and use the output.

Examples of these workflows below.


Staying in R env GDAL ecosystem will work.

library(terra) 
url = "https://storage.googleapis.com/nmfs_odp_nwfsc/CB/fish-pace-datasets/chla-z/netcdf/chla_z_20240305_v2.nc" 
r <- rast(url, vsi = TRUE)

reticulate with py_require() works. This works because it uses the same gdal libraries as R does. You can mix this with the terra commands.

library(reticulate)

py_require(c(
  "xarray",
  "h5netcdf",
  "h5py",
  "fsspec",
  "requests",
  "aiohttp"
))

xr <- import("xarray")

url <- "https://storage.googleapis.com/nmfs_odp_nwfsc/CB/fish-pace-datasets/chla-z/netcdf/chla_z_20240305_v2.nc"
ds <- xr$open_dataset(url, engine = "h5netcdf")
ds

Mixing conda GDAL and R GDAL system libraries does not work

The problems happen when you try to use the conda env. Now it uses/wants the gdal libraries in the conda env. RStudio sets up the LD_LIBRARY_PATH as soon as it starts up. This does not happen if you start R via JupyterLab kernel or via R in the terminal.

This will work in plain R session (start R from terminal or JupyterLab) but not one started in RStudio.

library(reticulate)
use_condaenv("notebook", required = TRUE)
xr <- import("xarray")
url <- "https://storage.googleapis.com/nmfs_odp_nwfsc/CB/fish-pace-datasets/chla-z/netcdf/chla_z_20240305_v2.nc"
ds <- xr$open_dataset(url, engine = "h5netcdf")
ds

Even in R started in a terminal, problems happen if we try to mix functions that link to GDAL w R versus Python in our conda env.

In R in a terminal: Run terra first, and the python does not work since R gdal libraries are linked.

url = "https://storage.googleapis.com/nmfs_odp_nwfsc/CB/fish-pace-datasets/chla-z/netcdf/chla_z_20240305_v2.nc" 

library(terra) 
r <- rast(url, vsi = TRUE)

# now you get an SSL error; it is not finding the dyn libs that it needs
library(reticulate)
use_condaenv("notebook", required = TRUE)
xr <- import("xarray")
ds <- xr$open_dataset(url, engine = "h5netcdf")

Run terra after loading conda, and terra may or may not work, depending how it is feeling.

url = "https://storage.googleapis.com/nmfs_odp_nwfsc/CB/fish-pace-datasets/chla-z/netcdf/chla_z_20240305_v2.nc" 

library(reticulate)
use_condaenv("notebook", required = TRUE)
xr <- import("xarray")
ds <- xr$open_dataset(url, engine = "h5netcdf")

# may or may not work; but def the dyn libs are wrong
library(terra) 
r <- rast(url, vsi = TRUE)

How to keep conda GDAL and R GDAL completely separate in a script, assuming you do not want to use py_require() which is probably the better way to go. You can use callr::r() to start a clear R session within a running R session. But callr only returns R objects so you will need to do something with ds, like save to netcdf

url <- "https://storage.googleapis.com/nmfs_odp_nwfsc/CB/fish-pace-datasets/chla-z/netcdf/chla_z_20240305_v2.nc"

out <- file.path(Sys.getenv("HOME"), "chla_subset.nc")

callr::r(function(url, out) {
  library(reticulate)
  use_condaenv("notebook", required = TRUE)

  xr <- import("xarray", convert = FALSE)
  py <- import("builtins", convert = FALSE)

  ds <- xr$open_dataset(url, engine = "h5netcdf")

  da <- ds[["CHLA"]]$
    isel(time = 0L, z = 0L)$
    sel(lat = py$slice(10, 5),
        lon = py$slice(-140, -135))

  da$to_netcdf(out)
  out
}, args = list(url = url, out = out))

out
file.exists(out)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions