Skip to content

Overhead of future_lapply relative to parLapply/mclapply, etc. #68

@traversc

Description

@traversc

I have a number of tasks that look like: lapply(long_list, fast_function) and I'd like to get away from using mclapply (for reasons you've talked about before).

However, in my benchmarks I see that future_apply has a larger overhead comapred to parLapply/mclapply.

Are there parameters I can tune to improve the performance on these types of tasks?

An example:

library(dplyr)
library(parallel)
library(future.apply)
library(microbenchmark)
plan(multisession(workers=4))
cl <- parallel::makeCluster(4)

v <- paste0(paste0("gene", 1:100), "*", 1:3)
v <- sample(v, 10000, replace=T)

parL <- function(v) {
  parallel::clusterExport(cl, varlist = "%>%")
  v <- parallel::parLapply(cl, v, function(.x) {
    gsub("\\*$", "", .x) %>% gsub("\\*.+$", "", .) %>% unique %>% 
      paste0(collapse = ",")
  })
}

serial <- function(v) {
  v <- lapply(v, function(.x) {
    gsub("\\*$", "", .x) %>% gsub("\\*.+$", "", .) %>% unique %>% 
      paste0(collapse = ",")
  })
}

mcl <- function(v) {
  v <- mclapply(v, function(.x) {
    gsub("\\*$", "", .x) %>% gsub("\\*.+$", "", .) %>% unique %>% 
      paste0(collapse = ",")
  }, mc.cores=4)
}


fut <- function(v) {
  v <- future_lapply(v, function(.x) {
    gsub("\\*$", "", .x) %>% gsub("\\*.+$", "", .) %>% unique %>% 
      paste0(collapse = ",")
  })
}

microbenchmark(parL = parL(v), mcl = mcl(v), serial = serial(v), fut = fut(v), times = 5, setup=gc())

Unit: milliseconds
   expr       min        lq      mean    median        uq       max neval cld
   parL  529.5245  534.1097  677.4822  640.9563  746.9266  935.8941     5  a 
    mcl  445.8535  451.9500  464.9154  459.4391  474.9048  492.4295     5  a 
 serial 1339.9738 1451.7585 1467.4781 1461.9080 1517.0687 1566.6813     5   b
    fut 1059.6930 1060.1854 1342.6222 1064.8015 1456.4210 2072.0099     5   b

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions