diff --git a/docs/tutorials/reticulate.md b/docs/tutorials/reticulate.md index 791299990..6940a5e06 100644 --- a/docs/tutorials/reticulate.md +++ b/docs/tutorials/reticulate.md @@ -6,6 +6,23 @@ While the separate [synapser](https://github.com/Sage-Bionetworks/synapser) R pa ## Installation +### Setting up a fresh RStudio environment with Python + +The following Docker command starts an RStudio instance with Python pre-installed, which you can use to follow this guide: + +```bash +docker run --rm -it -p 8787:8787 \ + -e PASSWORD=rstudio \ + rocker/rstudio:latest \ + bash -lc " + apt-get update && + apt-get install -y python3 python3-venv python3-pip python3-dev build-essential libcurl4-openssl-dev libssl-dev libxml2-dev && + /init + " +``` + +Then open `http://localhost:8787` in your browser (username: `rstudio`, password: `rstudio`). + ### Installing reticulate This article assumes that reticulate is installed and available in your R environment. If not it can be installed as follows: @@ -18,14 +35,12 @@ install.packages("reticulate") The Python synapseclient can be installed either directly into the Python installation you intend to use with reticulate or from within R using the reticulate library. -synapseclient has the same requirements and dependencies when installed for use with reticulate as it does in other usage. In particular note that synapseclient requires a Python version of 3.6 or greater. +synapseclient has the same requirements and dependencies when installed for use with reticulate as it does in other usage. In particular note that synapseclient requires a Python version of 3.10 or greater. #### Installing into Python The Python synapseclient is available on the [PyPi package repository](https://pypi.org/project/synapseclient/) and can be installed through Python tools that interface with the repository, such as **pip**. To install synapseclient for use with reticulate directly into a Python environment, first ensure that the current Python interpreter is the one you intend to use with reticulate. This may be a particular installation of Python, or a loaded [virtual environment](https://docs.python.org/3/tutorial/venv.html). See reticulate's [Python version configuration documentation](https://rstudio.github.io/reticulate/articles/versions.html) for more information on how reticulate can be configured to use particular Python environments. -For help installing a reticulate compatible Python, see the reticulate version of the [SynapseShinyApp](https://github.com/Sage-Bionetworks/SynapseShinyApp/tree/reticulate). - Once you have ensured you are interacting with your intended Python interpreter, follow the standard synapseclient [installation instructions](./installation.md) to install synapseclient. #### Installing from R/Reticulate @@ -38,101 +53,55 @@ library(reticulate) Once loaded, ensure that reticulate will use the Python installation you intend. You may need to provide reticulate a hint or otherwise [point it at the proper Python installation](https://rstudio.github.io/reticulate/articles/versions.html). -Next install the synapseclient using reticulate's [py_install](https://rstudio.github.io/reticulate/reference/py_install.html) command, e.g. - -```r -py_install("synapseclient") -``` - -You may also want to install some of synapseclient's optional dependencies, such as [Pandas](https://pandas.pydata.org/) for table support. +Next install the synapseclient using reticulate's [py_install](https://rstudio.github.io/reticulate/reference/py_install.html) command. We recommend installing with the `pandas` and `curator` optional dependencies: ```r -py_install("pandas") +py_install("synapseclient[pandas,curator]") ``` See synapseclient's [installation instructions](./installation.md) for more information on optional dependencies. ## Usage -Once synapseclient is installed it can be used once it is imported through R's [import](https://rstudio.github.io/reticulate/reference/import.html) command: +Once synapseclient is installed, import the top-level module and the `models` and `operations` submodules through R's [import](https://rstudio.github.io/reticulate/reference/import.html) command: ```r synapseclient <- import("synapseclient") +models <- import("synapseclient.models") +operations <- import("synapseclient.operations") ``` -If you are using synapseclient with reticulate when writing an R package, you will want to wrap the import in an onLoad and use the delay_load option, .e.g. - -```r -synapseclient <- NULL +The `models` module contains dataclass entity types (e.g. `File`, `Project`, `Folder`). Each instance exposes methods like `store()`, `get()`, and `delete()` directly. The `operations` module provides top-level functions — most usefully `operations$get()` for retrieving an entity by Synapse ID when the entity type is not known in advance. -.onLoad <- function(libname, pkgname) { - synapseclient <<- reticulate::import("synapseclient", delay_load = TRUE) -} -``` - -This will allow users of your package to configure their reticulate usage properly regardless of when they load your package. More information on this technique can be found [here](https://rstudio.github.io/reticulate/articles/package.html). - -If you are familiar with the **synapser** R package, many of the commands will be similar, but unlike in synapser where package functions and classes are made available in the global namespace through the search path, when using synapseclient through reticulate, classes are accessed through the imported synapseclient module and functionality is provided through an instantiated Synapse instance. - -For example classes that were globally available are now available through the imported synapseclient module. - -```r -# File from synapser -synapseclient$File - -# Table from synapser -synapseclient$Table -``` +## Credentials -And various syn functions are now methods on the Synapse object: +synapseclient accessed through reticulate supports the same authentication options as it does when accessed directly from Python. Log in once per session using the `Synapse` class and your auth token: ```r -# using synapseclient with reticulate we must instantiate a Synapse instance syn <- synapseclient$Synapse() - -# synLogin from synapser syn$login() - -# synGet from synapser -syn$get(identifier) - -# synStore from syanpser -syn$store(entity) -``` - -Each synapse object has its own state, such as configuration and login credentials. - -## Credentials - -synapseclient accessed through reticulate supports the same authentication options as it does when accessed directly from Python, for example: - -```r -syn <- synapseclient$synapse() - -# one time login -syn$login('') ``` See [Managing Synapse Credentials](./authentication.md) for complete documentation on how synapseclient handles credentials and authentication. ## Accessing Data -The following illustrates some examples of storing and retrieving data in Synapse using synapseclient through reticulate. +The following illustrates storing and retrieving data in Synapse using the new OOP models and operations API. See [here](./python_client.md#accessing-data) for more details on available data access APIs. -Create a project with a unique name +### Create a project ```r -# use hex_digits to generate random string and use it to name a project +# use hex_digits to generate a random string for the project name hex_digits <- c(as.character(0:9), letters[1:6]) projectName <- sprintf("My unique project %s", paste0(sample(hex_digits, 32, replace = TRUE), collapse = "")) -project <- synapseclient$Project(projectName) -project <- syn$store(project) +project <- models$Project(name = projectName) +project <- project$store() ``` -Create, store, and retrieve a file +### Create, store, and retrieve a file ```r filePath <- tempfile() @@ -140,41 +109,50 @@ connection <- file(filePath) writeChar("a \t b \t c \n d \t e \t f \n", connection, eos = NULL) close(connection) -file <- synapseclient$File(path = filePath, parent = project) -file <- syn$store(file) -synId <- file$properties$id +# store a file inside the project +fileEntity <- models$File(path = filePath, parent_id = project$id) +fileEntity <- fileEntity$store() +synId <- fileEntity$id -# download the file using its identifier to specific path -fileEntity <- syn$get(synId, downloadLocation="/path/to/folder") +# retrieve the file by its Synapse ID +# use operations$get when you don't know the entity type in advance +fileEntity <- operations$get(synId) -# view the file meta data in the console +# view the file metadata in the console print(fileEntity) -# view the file on the web -syn$onweb(synId) +# open the file on the web +operations$onweb(synId) ``` -Create folder and add files to the folder: +### Create a folder and add files to it ```r -dataFolder <- synapseclient$Folder("Data", parent = project) -dataFolder <- syn$store(dataFolder) +dataFolder <- models$Folder(name = "Data", parent_id = project$id) +dataFolder <- dataFolder$store() filePath <- tempfile() connection <- file(filePath) writeChar("this is the content of the file", connection, eos = NULL) close(connection) -file <- synapseclient$File(path = filePath, parent = dataFolder) -file <- syn$store(file) + +fileEntity <- models$File(path = filePath, parent_id = dataFolder$id) +fileEntity <- fileEntity$store() ``` ## Annotating Synapse Entities -This illustrates adding annotations to a Synapse entity. +Annotations can be stored directly on model objects via the `annotations` attribute and then stored to Synapse: ```r -# first retrieve the existing annotations object -annotations <- syn$get_annotations(project) +project$annotations <- list(foo = "bar", fooList = list("bar", "baz")) +project <- project$store() +``` + +Alternatively, retrieve and update annotations directly: + +```r +annotations <- syn$get_annotations(project$id) annotations$foo <- "bar" annotations$fooList <- list("bar", "baz") @@ -191,21 +169,26 @@ This example illustrates creating an entity with associated provenance. See [here][synapseclient.activity] for more information on Activity/Provenance related APIs. ```r -act <- synapseclient$Activity( +act <- models$Activity( name = "clustering", description = "whizzy clustering", - used = c("syn1234", "syn1235"), - executed = "syn4567") -``` + used = list( + models$UsedEntity(target_id = "syn1234"), + models$UsedEntity(target_id = "syn1235") + ), + executed = list( + models$UsedURL(url = "https://github.com/my-org/my-repo") + ) +) -```r filePath <- tempfile() connection <- file(filePath) writeChar("some test", connection, eos = NULL) close(connection) -file = synapseclient$File(filePath, name="provenance_file.txt", parent=project) -file <- syn$store(file, activity = act) +fileEntity <- models$File(path = filePath, name = "provenance_file.txt", parent_id = project$id) +fileEntity$activity <- act +fileEntity <- fileEntity$store() ``` ## Tables @@ -214,10 +197,10 @@ These examples illustrate manipulating Synapse Tables. Note that you must have installed the Pandas dependency into the Python environment as described above in order to use this feature. -See [here][synapseclient.table] for more information on tables. +See [here][synapseclient.models.Table] for more information on tables. -The following illustrates building a table from an R data frame. The schema will be generated -from the data types of the values within the data frame. +The following illustrates building a table from an R data frame with the schema automatically +inferred from the data types of the columns. ```r # start with an R data frame @@ -229,17 +212,21 @@ genes <- data.frame( Strand = c("+", "+", "-", "-", "+", "+"), TranscriptionFactor = c(F, F, F, F, T, F)) -# build a Synapse table from the data frame. -# a schema is automatically generated -# note that reticulate will automatically convert from an R data frame to Pandas -table <- synapseclient$build_table("My Favorite Genes", project, genes) +# create the table schema in Synapse +table <- models$Table(name = "My Favorite Genes", parent_id = project$id) +table <- table$store() -table <- syn$store(table) +# upload rows — reticulate auto-converts the R data frame to a pandas DataFrame. +# INFER_FROM_DATA automatically creates columns from the data frame's schema. +table$store_rows( + values = genes, + schema_storage_strategy = models$SchemaStorageStrategy$INFER_FROM_DATA +) ``` -Alternately the schema can be specified. At this time when using date values it is necessary -to use a date string formatted in "YYYY-MM-dd HH:mm:ss.mmm" format or integer unix epoch millisecond -value and explicitly specify the type in the schema due to how dates are translated to the Python client. +Alternately the schema can be specified explicitly using `Column` objects. When using date +values it is necessary to use a date string formatted in "YYYY-MM-dd HH:mm:ss.mmm" format +or an integer unix epoch millisecond value and explicitly specify the column type. ```r prez_birthdays <- data.frame( @@ -247,152 +234,31 @@ prez_birthdays <- data.frame( Time = c("1732-02-22 11:23:11.024", "1743-04-13 00:00:00.000", "1809-02-12 01:02:03.456")) cols <- list( - synapseclient$Column(name = "Name", columnType = "STRING", maximumSize = 20), - synapseclient$Column(name = "Time", columnType = "DATE")) - -schema <- synapseclient$Schema(name = "President Birthdays", columns = cols, parent = project) -table <- synapseclient$Table(schema, prez_birthdays) - -# store the table in Synapse -table <- syn$store(table) -``` - -We can query a table as in the following: - -```r -tableId <- table$tableId - -results <- syn$tableQuery(sprintf("select * from %s where Name='George Washington'", tableId)) -results$asDataFrame() -``` - -## Wikis - -This example illustrates creating a wiki. - -See [here][synapseclient.wiki] for more information on wiki APIs. - -```r -content <- " -# My Wiki Page -Here is a description of my **fantastic** project! -" - -# attachment -filePath <- tempfile() -connection <- file(filePath) -writeChar("this is the content of the file", connection, eos = NULL) -close(connection) -wiki <- synapseclient$Wiki( - owner = project, - title = "My Wiki Page", - markdown = content, - attachments = list(filePath) + models$Column(name = "Name", column_type = models$ColumnType$STRING, maximum_size = 20L), + models$Column(name = "Time", column_type = models$ColumnType$DATE) ) -wiki <- syn$store(wiki) -``` - -An existing wiki can be updated as follows. - -```r -wiki <- syn$getWiki(project) -wiki$markdown <- " -# My Wiki Page -Here is a description of my **fantastic** project! Let's -*emphasize* the important stuff. -" -wiki <- syn$store(wiki) -``` - -## Evaluations - -An Evaluation is a Synapse construct useful for building processing pipelines and -for scoring predictive modeling and data analysis challenges. -See [here][synapseclient.evaluation] for more information on Evaluations. - -Creating an Evaluation: - -```r -eval <- synapseclient$Evaluation( - name = sprintf("My unique evaluation created on %s", format(Sys.time(), "%a %b %d %H%M%OS4 %Y")), - description = "testing", - contentSource = project, - submissionReceiptMessage = "Thank you for your submission!", - submissionInstructionsMessage = "This evaluation only accepts files.") - -eval <- syn$store(eval) - -eval <- syn$getEvaluation(eval$id) +table <- models$Table(name = "President Birthdays", parent_id = project$id, columns = cols) +table <- table$store() +table$store_rows(values = prez_birthdays) ``` -Submitting a file to an existing Evaluation: +We can query a table using `models$query`, which returns a pandas DataFrame that reticulate +automatically converts to an R data frame: ```r -# first create a file to submit -filePath <- tempfile() -connection <- file(filePath) -writeChar("this is my first submission", connection, eos = NULL) -close(connection) -file <- synapseclient$File(path = filePath, parent = project) -file <- syn$store(file) -# submit the created file -submission <- syn$submit(eval, file) -``` - -List submissions: - -```r -submissions <- syn$getSubmissionBundles(eval) - -# submissions are returned as a generator -list(iterate(submissions)) -``` - -Retrieving submission by id: - -```r -submission <- syn$getSubmission(submission$id) -``` - -Retrieving the submission status: - -```r -submissionStatus <- syn$getSubmissionStatus(submission) -submissionStatus -``` - -Query an evaluation: - -```r -queryString <- sprintf("query=select * from evaluation_%s LIMIT %s OFFSET %s'", eval$id, 10, 0) -syn$restGET(paste("/evaluation/submission/query?", URLencode(queryString), sep = "")) -``` - -## Sharing Access to Content - -The following illustrates sharing access to a Synapse Entity. - -See [here](../explanations/access_control.md) for more information on Access Control including all available permissions. - -```r -# get permissions on an entity -# to get permissions for a user/group pass a principalId identifier, -# otherwise the assumed permission will apply to the public - -# make the project publicly accessible -acl <- syn$setPermissions(project, accessType = list("READ")) - -perms = syn$getPermissions(project) +results <- models$query( + sprintf("select * from %s where Name='George Washington'", table$id) +) +results ``` ## Views -A view is a view of all entities (File, Folder, Project, Table, Docker Repository, View) within one or more Projects or Folders. Views can: +An EntityView provides a table-like interface over entities (Files, Folders, Projects, Tables, etc.) spread across one or more Projects or Folders. You can query a view with SQL, making it easy to filter, sort, and inspect entity metadata at scale. The following examples illustrate some view operations. -See [here](../guides/views.md) for more information on Views. A view is implemented as a Table, -see [here][synapseclient.table] for more information on Tables. +See [here](../guides/views.md) for more information on Views. First create some files we can use in a view: @@ -401,77 +267,68 @@ filePath1 <- tempfile() connection <- file(filePath1) writeChar("this is the content of the first file", connection, eos = NULL) close(connection) -file1 <- synapseclient$File(path = filePath1, parent = project) -file1 <- syn$store(file1) +fileEntity1 <- models$File(path = filePath1, parent_id = project$id) +fileEntity1 <- fileEntity1$store() + filePath2 <- tempfile() connection2 <- file(filePath2) -writeChar("this is the content of the second file", connection, eos = NULL) +writeChar("this is the content of the second file", connection2, eos = NULL) close(connection2) -file2 <- synapseclient$File(path = filePath2, parent = project) -file2 <- syn$store(file2) +fileEntity2 <- models$File(path = filePath2, parent_id = project$id) +fileEntity2 <- fileEntity2$store() -# add some annotations -fileAnnotations1 <- syn$get_annotations(file1) -fileAnnotations2 <- syn$get_annotations(file2) +# add some annotations and re-store +fileEntity1$annotations <- list(contributor = "Sage", class = "V") +fileEntity1 <- fileEntity1$store() -fileAnnotations1$contributor <- "Sage" -fileAnnotations1$class <- "V" -syn$set_annotations(fileAnnotations1) - -fileAnnotations2$contributor = "UW" -fileAnnotations2$rank = "X" -syn$set_annotations(fileAnnotations2) +fileEntity2$annotations <- list(contributor = "UW", rank = "X") +fileEntity2 <- fileEntity2$store() ``` -Now create a view: +Now create an `EntityView` scoped to the project. Use `bitwOr` to combine `ViewTypeMask` +values for multiple entity types: ```r -columns = c( - synapseclient$Column(name = "contributor", columnType = "STRING"), - synapseclient$Column(name = "class", columnType = "STRING"), - synapseclient$Column(name = "rank", columnType = "STRING") -) - -view <- synapseclient$EntityViewSchema( - name = "my first file view", - columns = columns, - parent = project, - scopes = project, - includeEntityTypes = c(synapseclient$EntityViewType$FILE, synapseclient$EntityViewType$FOLDER), - addDefaultViewColumns = TRUE +view <- models$EntityView( + name = "my first file view", + parent_id = project$id, + scope_ids = list(project$id), + view_type_mask = bitwOr( + as.integer(models$ViewTypeMask$FILE), + as.integer(models$ViewTypeMask$FOLDER) + ) ) -view <- syn$store(view) +view <- view$store() ``` -We can now see content of our view (note that views are not created synchronously it may take a few seconds for the view table to be queryable). +We can now query the view (note that views are not created synchronously; it may take a few seconds for the view table to be queryable): ```r -queryResults <- syn$tableQuery(sprintf("select * from %s", view$properties$id)) -data <- queryResults$asDataFrame() -data +results <- models$query(sprintf("select * from %s", view$id)) +results ``` -We can update annotations using a view as follows: +## Using with a Shiny App -```r -data["class"] <- c("V", "VI") -syn$store(synapseclient$Table(view$properties$id, data)) +Reticulate and the Python synapseclient can be used to workaround an issue that exists when using synapser with a Shiny App. Since synapser shares a Synapse client instance within the R process, multiple users of a synapser integrated Shiny App may end up sharing a login if precautions aren't taken. When using reticulate with synapseclient, session scoped Synapse client objects can be created that avoid this issue. -# the change in annotations is reflected in get_annotations(): -syn$get_annotations(file2$properties$id) -``` +See [SynapseShinyApp](https://github.com/Sage-Bionetworks/SynapseShinyApp) for a sample application and a discussion of the issue, and the [reticulate](https://github.com/Sage-Bionetworks/SynapseShinyApp/tree/reticulate) branch for an alternative implementation using reticulate with synapseclient. -## Update View's Content +## Developing an R Package with synapseclient and reticulate -```r -# A view can contain different types of entity. To change the types of entity that will show up in a view: -view <- syn$get(view$properties$id) -view$set_entity_types(list(synapseclient$EntityViewType$FILE)) -``` +If you are building an R package that wraps synapseclient, wrap all imports in `.onLoad` and use the `delay_load` option. This lets users configure their Python environment before any import occurs, regardless of when they load your package: -## Using with a Shiny App +```r +synapseclient <- NULL +models <- NULL +operations <- NULL -Reticulate and the Python synapseclient can be used to workaround an issue that exists when using synapser with a Shiny App. Since synapser shares a Synapse client instance within the R process, multiple users of a synapser integrated Shiny App may end up sharing a login if precautions aren't taken. When using reticulate with synapseclient, session scoped Synapse client objects can be created that avoid this issue. +.onLoad <- function(libname, pkgname) { + synapseclient <<- reticulate::import("synapseclient", delay_load = TRUE) + models <<- reticulate::import("synapseclient.models", delay_load = TRUE) + operations <<- reticulate::import("synapseclient.operations", delay_load = TRUE) +} +``` -See [SynapseShinyApp](https://github.com/Sage-Bionetworks/SynapseShinyApp) for a sample application and a discussion of the issue, and the [reticulate](https://github.com/Sage-Bionetworks/SynapseShinyApp/tree/reticulate) branch for an alternative implementation using reticulate with synapseclient. +More information on this technique can be found [here](https://rstudio.github.io/reticulate/articles/package.html).