Merge pull request #3 from LCSB-BioCore/docs

laurentheirendt · web-flow · commit 931c7b1ca18d · 2021-01-25T13:11:28.000+01:00
Add reasonable docs
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -0,0 +1,26 @@
+# ref: https://juliadocs.github.io/Documenter.jl/stable/man/hosting/#GitHub-Actions-1
+name: Documentation
+
+on:
+  push:
+    branches:
+      - develop
+    tags: '*'
+  pull_request:
+  release:
+    types: [published, created]
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - uses: julia-actions/setup-julia@latest
+        with:
+          version: 1.5
+      - name: Install dependencies
+        run: julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd())); Pkg.instantiate()'
+      - name: Build and deploy
+        env:
+          DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }} # For authentication with SSH deploy key
+        run: julia --project=docs/ docs/make.jl
diff --git a/README.md b/README.md
@@ -1,9 +1,135 @@
 # DiDa.jl
 
-Simple Distributed Data manipulation and processing routines for Julia.
+Simple distributed data manipulation and processing routines for Julia.
 
 This was originally developed for
-[GigaSOM.jl](https://github.com/LCSB-BioCore/GigaSOM.jl), this package contains
-the separated-out lightweight distributed-processing framework that can be used
-with GigaSOM.
+[`GigaSOM.jl`](https://github.com/LCSB-BioCore/GigaSOM.jl); DiDa.jl package
+contains the separated-out lightweight distributed-processing framework that
+was used in `GigaSOM.jl`.
 
+## Why?
+
+DiDa.jl provides a very simple, imperative and straightforward way to move your
+data around a cluster of Julia processes created by the
+[`Distributed`](https://docs.julialang.org/en/v1/stdlib/Distributed/) package,
+and run computation on the distributed data pieces. The main aim of the package
+is to avoid anything complicated-- the first version used in
+[GigaSOM](https://github.com/LCSB-BioCore/GigaSOM.jl) had just under 500 lines
+of relatively straightforward code (including the doc-comments).
+
+Compared to plain `Distributed` API, you get more straightforward data
+manipulation primitives, some extra control over the precise place where code
+is executed, and a few high-level functions. These include a distributed
+version of `mapreduce`, simpler work-alike of the
+[DistributedArrays](https://github.com/JuliaParallel/DistributedArrays.jl)
+functionality, and easy-to-use distributed dataset saving and loading.
+
+Most importantly, the main motivation behind the package is that the
+distributed processing should be simple and accessible.
+
+## Brief how-to
+
+The package provides a few very basic primitives that lightly wrap the
+`Distributed` package functions `remotecall` and `fetch`. The most basic one is
+`save_at`, which takes a worker ID, variable name and variable content, and
+saves the content to the variable on the selected worker. `get_from` works the
+same way, but takes the data back from the worker.
+
+You can thus send some random array to a few distributed workers:
+
+```julia
+julia> using Distributed, DiDa
+
+julia> addprocs(2)
+2-element Array{Int64,1}:
+ 2
+ 3
+
+julia> @everywhere using DiDa
+
+julia> save_at(2, :x, randn(10,10))
+Future(2, 1, 4, nothing)
+```
+
+The `Future` returned from `save_at` is the normal Julia future from
+`Distributed`, you can even `fetch` it to wait until the operation is really
+done on the other side. Fetching the data is done the same way:
+
+```julia
+julia> get_from(2,:x)
+Future(2, 1, 15, nothing)
+
+julia> get_val_from(2,:x) # auto-fetch()ing variant
+10×10 Array{Float64,2}:
+ -0.850788    0.946637     1.78006   … 
+ -0.49596     0.497829    -2.03013
+   …
+```
+
+All commands support full quoting, which allows you to easily distinguish
+between code parts that are executed locally and remotely:
+
+```julia
+julia> save_at(3, :x, randn(1000,1000))     # generates a matrix locally and sends it to the remote worker
+
+julia> save_at(3, :x, :(randn(1000,1000)))  # generates a matrix right on the remote worker and saves it there
+
+julia> get_val_from(3, :x)                  # retrieves the generated matrix and fetches it
+…
+
+julia> get_val_from(3, :(randn(1000,1000))) # generates the matrix on the worker and fetches the data
+…
+```
+
+Notably, this is different from the approach taken by `DistributedArrays` and
+similar packages -- all data manipulation is explicit, and any data type is
+supported as long as it can be moved among workers by the `Distributed`
+package. This helps with various highly non-array-ish data, such as large text
+corpora and graphs.
+
+There are various goodies for easy work with matrix-style data, namely
+scattering, gathering and running distributed algorithms:
+
+```julia
+julia> x = randn(1000,3)
+1000×3 Array{Float64,2}:
+ -0.992481   0.551064     1.67424
+ -0.751304  -0.845055     0.105311
+ -0.712687   0.165619    -0.469055
+  ⋮
+
+julia> dataset = scatter_array(:myDataset, x, workers())  # sends slices of the array to workers
+Dinfo(:myDataset, [2, 3])   # a helper for holding the variable name and the used workers together
+
+julia> get_val_from(3, :(size(myDataset)))
+(500, 3)    # there's really only half of the data
+
+julia> dmapreduce(dataset, sum, +) # MapReduce-style sum of all data
+-51.64369103751014
+
+julia> dstat(dataset, [1,2,3]) # get means and sdevs in individual columns
+([-0.030724038974465212, 0.007300925745200863, -0.028220577808245786],
+ [0.9917470012495775, 0.9975120525455358, 1.000243845434252])
+
+julia> dmedian(dataset, [1,2,3]) # distributed iterative median in columns
+3-element Array{Float64,1}:
+  0.004742259615849834
+  0.039043266340824986
+ -0.05367799062404967
+
+julia> dtransform(dataset, x -> 2 .^ x) # exponentiate all data (medians should now be around 1)
+Dinfo(:myDataset, [2, 3])
+
+julia> gather_array(dataset) # download the data from workers to a sing
+1000×3 Array{Float64,2}:
+ 0.502613  1.46517   3.1915
+ 0.594066  0.55669   1.07573
+ 0.610183  1.12165   0.722438
+  ⋮
+```
+
+## What does the name `DiDa` mean?
+
+**Di**stributed **Da**ta.
+
+There is no consensus on how to pronounce the shortcut.
diff --git a/docs/Project.toml b/docs/Project.toml
@@ -0,0 +1,3 @@
+[deps]
+Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
+DocumenterTools = "35a29f4d-8980-5a13-9543-d66fff28ecb8"
diff --git a/docs/make.jl b/docs/make.jl
@@ -0,0 +1,22 @@
+using Documenter, DiDa
+
+makedocs(modules = [DiDa],
+    clean = false,
+    format = Documenter.HTML(prettyurls = !("local" in ARGS)),
+    sitename = "DiDa.jl",
+    authors = "The developers of DiDa.jl",
+    linkcheck = !("skiplinks" in ARGS),
+    pages = [
+        "Home" => "index.md",
+        "Tutorial" => "tutorial.md",
+        "Functions" => "functions.md",
+    ],
+)
+
+deploydocs(
+    repo = "github.com/LCSB-BioCore/DiDa.jl.git",
+    target = "build",
+    branch = "gh-pages",
+    devbranch = "develop",
+    versions = "stable" => "v^",
+)
diff --git a/docs/src/assets/logo.svg b/docs/src/assets/logo.svg
@@ -0,0 +1,126 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<svg
+   xmlns:dc="http://purl.org/dc/elements/1.1/"
+   xmlns:cc="http://creativecommons.org/ns#"
+   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+   xmlns:svg="http://www.w3.org/2000/svg"
+   xmlns="http://www.w3.org/2000/svg"
+   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
+   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
+   width="39.718746mm"
+   height="27.8125mm"
+   viewBox="0 0 39.718746 27.8125"
+   version="1.1"
+   id="svg8"
+   inkscape:version="1.0.1 (3bc2e813f5, 2020-09-07)"
+   sodipodi:docname="dida.svg"
+   inkscape:export-filename="/home/exa/dida.png"
+   inkscape:export-xdpi="319.74823"
+   inkscape:export-ydpi="319.74823">
+  <defs
+     id="defs2" />
+  <sodipodi:namedview
+     id="base"
+     pagecolor="#ffffff"
+     bordercolor="#666666"
+     borderopacity="1.0"
+     inkscape:pageopacity="0.0"
+     inkscape:pageshadow="2"
+     inkscape:zoom="4.3990289"
+     inkscape:cx="62.237436"
+     inkscape:cy="94.893145"
+     inkscape:document-units="mm"
+     inkscape:current-layer="layer1"
+     inkscape:document-rotation="0"
+     showgrid="false"
+     fit-margin-top="1"
+     lock-margins="true"
+     fit-margin-left="1"
+     fit-margin-right="1"
+     fit-margin-bottom="1"
+     inkscape:window-width="1920"
+     inkscape:window-height="1166"
+     inkscape:window-x="0"
+     inkscape:window-y="2194"
+     inkscape:window-maximized="1">
+    <inkscape:grid
+       type="xygrid"
+       id="grid10"
+       originx="-20.489585"
+       originy="-12.552083" />
+  </sodipodi:namedview>
+  <metadata
+     id="metadata5">
+    <rdf:RDF>
+      <cc:Work
+         rdf:about="">
+        <dc:format>image/svg+xml</dc:format>
+        <dc:type
+           rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
+        <dc:title></dc:title>
+      </cc:Work>
+    </rdf:RDF>
+  </metadata>
+  <g
+     inkscape:label="Layer 1"
+     inkscape:groupmode="layer"
+     id="layer1"
+     transform="translate(-20.489586,-12.552083)">
+    <path
+       style="fill:none;fill-opacity:1;stroke:#4063d8;stroke-width:2.5;stroke-linecap:round;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="M 22.489586,18.520833 H 38.364583"
+       id="path843" />
+    <path
+       style="fill:none;stroke:#4d4d4d;stroke-width:2.5;stroke-linecap:round;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="M 22.489586,22.489583 H 38.364583"
+       id="path861" />
+    <path
+       style="fill:none;stroke:#4d4d4d;stroke-width:2.5;stroke-linecap:round;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="M 22.489586,26.458333 H 38.364583"
+       id="path863" />
+    <path
+       style="fill:none;stroke:#cb3c33;stroke-width:2.5;stroke-linecap:round;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="M 22.489586,30.427083 H 38.364583"
+       id="path869" />
+    <path
+       style="fill:none;stroke:#4d4d4d;stroke-width:2.5;stroke-linecap:round;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="M 22.489586,34.395833 H 38.364583"
+       id="path871" />
+    <path
+       style="fill:none;stroke:#4d4d4d;stroke-width:2.5;stroke-linecap:round;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="M 22.489586,38.364583 H 38.364583"
+       id="path877" />
+    <path
+       style="fill:none;stroke:#4d4d4d;stroke-width:2.5;stroke-linecap:round;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="M 22.489586,14.552083 H 38.364583"
+       id="path879" />
+    <path
+       style="fill:none;stroke:#4d4d4d;stroke-width:2.5;stroke-linecap:round;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="M 42.333335,18.520833 H 58.208333"
+       id="path889" />
+    <path
+       style="fill:none;stroke:#4d4d4d;stroke-width:2.5;stroke-linecap:round;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="M 42.333336,22.489583 H 58.208333"
+       id="path891" />
+    <path
+       style="fill:none;stroke:#389826;stroke-width:2.5;stroke-linecap:round;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="M 42.333336,26.458333 H 58.208333"
+       id="path893" />
+    <path
+       style="fill:none;stroke:#4d4d4d;stroke-width:2.5;stroke-linecap:round;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="M 42.333336,30.427083 H 58.208333"
+       id="path895" />
+    <path
+       style="fill:none;stroke:#9558b2;stroke-width:2.5;stroke-linecap:round;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="M 42.333336,34.395833 H 58.208333"
+       id="path897" />
+    <path
+       style="fill:none;stroke:#4d4d4d;stroke-width:2.5;stroke-linecap:round;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="M 42.333336,38.364583 H 58.208333"
+       id="path899" />
+    <path
+       style="fill:none;stroke:#4d4d4d;stroke-width:2.5;stroke-linecap:round;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       d="M 42.333336,14.552083 H 58.208333"
+       id="path901" />
+  </g>
+</svg>
diff --git a/docs/src/functions.md b/docs/src/functions.md
@@ -0,0 +1,29 @@
+# Functions
+
+## Data structures
+
+```@autodocs
+Modules = [DiDa]
+Pages = ["structs.jl"]
+```
+
+## Base functions
+
+```@autodocs
+Modules = [DiDa]
+Pages = ["base.jl"]
+```
+
+## Higher-level array operations
+
+```@autodocs
+Modules = [DiDa]
+Pages = ["tools.jl"]
+```
+
+## Input/Output
+
+```@autodocs
+Modules = [DiDa]
+Pages = ["io.jl"]
+```
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -0,0 +1,30 @@
+
+# DiDa.jl — simple work with distributed data
+
+This packages provides a relatively simple Distributed Data manipulation and
+processing routines for Julia.
+
+The design of the package and data manipulation approach is deliberately
+"imperative" and "hands-on", to allow as much user influence on the actual way
+the data are moved and stored in the cluster as possible. It uses the
+`Distributed` package and its infrastructure of workers, and provides a few
+very basic primitives that lightly wrap the `Distributed` package functions
+`remotecall` and `fetch`.
+
+There are also various extra functions to easily run distributed data
+transformations, MapReduce-style algorithms, store and load the data on worker
+local storage (e.g. to prevent memory exhaustion) and others.
+
+To start quickly, you can read the tutorial:
+
+```@contents
+Pages=["tutorial.md"]
+```
+
+### Functions
+
+A full reference to all functions is given here:
+
+```@contents
+Pages = ["functions.md"]
+```
diff --git a/docs/src/tutorial.md b/docs/src/tutorial.md

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+[deps]`
	`2`	`+Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"`
	`3`	`+DocumenterTools = "35a29f4d-8980-5a13-9543-d66fff28ecb8"`