Skip to content

Commit ccba3dd

Browse files
authored
Documentation for notebooks (#71)
1 parent 1c075bb commit ccba3dd

7 files changed

Lines changed: 1038 additions & 2 deletions

File tree

doc/conf.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
"sphinx.ext.intersphinx",
3535
"sphinx.ext.extlinks",
3636
"sphinx.ext.mathjax",
37+
"myst_nb",
3738
]
3839

3940
extlinks = {
@@ -302,3 +303,7 @@
302303
"numpy": ("https://docs.scipy.org/doc/numpy/", None),
303304
"xarray": ("https://docs.xarray.dev/en/stable/", None),
304305
}
306+
307+
# Myst-NB configuration
308+
nb_execution_mode = "force"
309+
nb_execution_raise_on_error = True

doc/dask.ipynb

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "9b9a1d2c-7664-4fd9-b5cb-3a766d907fe7",
6+
"metadata": {},
7+
"source": [
8+
"# Dask integration\n",
9+
"\n",
10+
"recursive-diff supports {class}`xarray.DataArray` and {class}`xarray.Dataset` objects backed by [Dask](https://dask.org). When it compares two such objects, the comparison is optimized to maximise parallelism and minimize memory usage.\n",
11+
"\n",
12+
"In this example, we're going to compare two arrays worth a total of 3 GiB.\n",
13+
"However, because they're lazily defined, the whole comparison will use only a few MiB RAM and will run on all available threads:"
14+
]
15+
},
16+
{
17+
"cell_type": "code",
18+
"execution_count": null,
19+
"id": "6e52dd2a-b565-4aee-8aa8-52c4a41e8914",
20+
"metadata": {},
21+
"outputs": [],
22+
"source": [
23+
"import sys\n",
24+
"\n",
25+
"sys.path.insert(0, \"..\")\n",
26+
"\n",
27+
"import dask.array as da\n",
28+
"import xarray\n",
29+
"\n",
30+
"from recursive_diff import display_diffs\n",
31+
"\n",
32+
"a = xarray.DataArray(da.ones((200_000, 1_000)), name=\"ones\")\n",
33+
"b = xarray.DataArray(da.ones((200_000, 1_000)), name=\"ones\")\n",
34+
"a[123_456, 789] = 1.01\n",
35+
"b[133_700, 333] = 1.0000000001 # Below tolerance\n",
36+
"\n",
37+
"display_diffs(a, b)"
38+
]
39+
},
40+
{
41+
"cell_type": "markdown",
42+
"id": "bf4417c1-3989-4512-8ea0-d9b5ecf31ab8",
43+
"metadata": {},
44+
"source": [
45+
"## Dask clusters\n",
46+
"If you have a Dask client active and compare chunked Xarray objects, the comparison will run on the Dask cluster.\n",
47+
"\n",
48+
"In this example we're using a ``LocalCluster``, but this works with remote clusters as well as [Coiled](https://coiled.io) clusters!\n",
49+
"\n",
50+
"You may use {func}`xarray.open_zarr` or {func}`xarray.open_dataset` to open Zarr or NetCDF files on S3, which means that if your client is outside of AWS the data won't transfer over the internet and you won't pay egress charges.\n",
51+
"S3 access not yet supported by {func}`~recursive_diff.recursive_open`."
52+
]
53+
},
54+
{
55+
"cell_type": "code",
56+
"execution_count": null,
57+
"id": "9205439e-3c2b-43d7-a512-b8a4e986ea27",
58+
"metadata": {},
59+
"outputs": [],
60+
"source": [
61+
"import dask.distributed\n",
62+
"\n",
63+
"with dask.distributed.LocalCluster() as cluster, dask.distributed.Client(cluster):\n",
64+
" display_diffs(a, b)"
65+
]
66+
},
67+
{
68+
"cell_type": "code",
69+
"execution_count": null,
70+
"id": "5822b326-f3e0-4be0-a015-a734ddbc816d",
71+
"metadata": {},
72+
"outputs": [],
73+
"source": []
74+
}
75+
],
76+
"metadata": {
77+
"kernelspec": {
78+
"display_name": "Python 3 (ipykernel)",
79+
"language": "python",
80+
"name": "python3"
81+
},
82+
"language_info": {
83+
"codemirror_mode": {
84+
"name": "ipython",
85+
"version": 3
86+
},
87+
"file_extension": ".py",
88+
"mimetype": "text/x-python",
89+
"name": "python",
90+
"nbconvert_exporter": "python",
91+
"pygments_lexer": "ipython3",
92+
"version": "3.14.3"
93+
}
94+
},
95+
"nbformat": 4,
96+
"nbformat_minor": 5
97+
}

doc/index.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,10 @@ Index
4040

4141
quickstart
4242
installing
43-
api
43+
notebooks
44+
dask
4445
extend
46+
api
4547
cli
4648
develop
4749
whats-new

doc/notebooks.ipynb

Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "9b9a1d2c-7664-4fd9-b5cb-3a766d907fe7",
6+
"metadata": {},
7+
"source": [
8+
"# Working with Jupyter notebooks\n",
9+
"\n",
10+
"{func}`~recursive_diff.display_diffs` can be used to compare two NumPy, Pandas, or Xarray objects in a Jupyter notebook:"
11+
]
12+
},
13+
{
14+
"cell_type": "code",
15+
"execution_count": null,
16+
"id": "737df1de-16b9-40d2-a704-29022846bbc0",
17+
"metadata": {},
18+
"outputs": [],
19+
"source": [
20+
"import sys\n",
21+
"\n",
22+
"sys.path.insert(0, \"..\")\n",
23+
"\n",
24+
"import xarray\n",
25+
"\n",
26+
"from recursive_diff import display_diffs\n",
27+
"\n",
28+
"a = xarray.Dataset(\n",
29+
" {\n",
30+
" \"v1\": ((\"r\", \"c\"), [[1, 2], [3, 4]]),\n",
31+
" \"v2\": (\"r\", [\"foo\", \"bar\"]),\n",
32+
" \"r\": [\"r1\", \"r2\"],\n",
33+
" \"extra\": [5],\n",
34+
" },\n",
35+
" attrs={\"some_tag\": \"Hello\"},\n",
36+
")\n",
37+
"\n",
38+
"b = xarray.Dataset(\n",
39+
" {\n",
40+
" \"v1\": ((\"r\", \"c\"), [[1, 5], [3.1, 4]]),\n",
41+
" \"v2\": (\"r\", [\"bar\", \"bar\"]),\n",
42+
" \"r\": [\"r1\", \"r2\"],\n",
43+
" },\n",
44+
" attrs={\"some_tag\": \"World\"},\n",
45+
")\n",
46+
"\n",
47+
"\n",
48+
"display_diffs(a, b)"
49+
]
50+
},
51+
{
52+
"cell_type": "markdown",
53+
"id": "cc84e5ad-4b39-4602-9da7-7aa63bbe6cb9",
54+
"metadata": {},
55+
"source": [
56+
"Just like {func}`recursive_diff.recursive_diff`, you may use it to visualize differences in nested structures too:\n"
57+
]
58+
},
59+
{
60+
"cell_type": "code",
61+
"execution_count": null,
62+
"id": "7565dc16-7dc3-4ade-b8dd-0abfef712677",
63+
"metadata": {},
64+
"outputs": [],
65+
"source": [
66+
"c = {\"foo\": [1, 2, [3, 4]]}\n",
67+
"d = {\"foo\": [1.0000000001, 5, [3]], \"bar\": 6}\n",
68+
"\n",
69+
"display_diffs(c, d)"
70+
]
71+
},
72+
{
73+
"cell_type": "markdown",
74+
"id": "b85aa876-ad19-4393-bcf3-4c0dd866cbec",
75+
"metadata": {},
76+
"source": [
77+
"## Comparing directories\n",
78+
"\n",
79+
"If you have two directories full of data, you can compare them in one go with {func}`~recursive_diff.recursive_open`:"
80+
]
81+
},
82+
{
83+
"cell_type": "code",
84+
"execution_count": null,
85+
"id": "a6324359-3d59-4250-b70d-f9cd0d0bbde0",
86+
"metadata": {},
87+
"outputs": [],
88+
"source": [
89+
"import json\n",
90+
"import tempfile\n",
91+
"\n",
92+
"lhs = tempfile.TemporaryDirectory()\n",
93+
"rhs = tempfile.TemporaryDirectory()\n",
94+
"\n",
95+
"a.to_zarr(f\"{lhs.name}/array.zarr\", mode=\"w\", zarr_format=2)\n",
96+
"b.to_zarr(f\"{rhs.name}/array.zarr\", mode=\"w\", zarr_format=2)\n",
97+
"with open(f\"{lhs.name}/nested.json\", \"w\") as fh:\n",
98+
" json.dump(c, fh)\n",
99+
"with open(f\"{rhs.name}/nested.json\", \"w\") as fh:\n",
100+
" json.dump(d, fh)"
101+
]
102+
},
103+
{
104+
"cell_type": "code",
105+
"execution_count": null,
106+
"id": "1dc6c4ba-ba4a-4baf-8f48-933a0fa83717",
107+
"metadata": {},
108+
"outputs": [],
109+
"source": [
110+
"from recursive_diff import recursive_open\n",
111+
"\n",
112+
"display_diffs(recursive_open(lhs.name), recursive_open(rhs.name))"
113+
]
114+
},
115+
{
116+
"cell_type": "code",
117+
"execution_count": null,
118+
"id": "42ac0b27-178c-4c59-89c8-dd7c37464656",
119+
"metadata": {},
120+
"outputs": [],
121+
"source": []
122+
}
123+
],
124+
"metadata": {
125+
"kernelspec": {
126+
"display_name": "Python 3 (ipykernel)",
127+
"language": "python",
128+
"name": "python3"
129+
},
130+
"language_info": {
131+
"codemirror_mode": {
132+
"name": "ipython",
133+
"version": 3
134+
},
135+
"file_extension": ".py",
136+
"mimetype": "text/x-python",
137+
"name": "python",
138+
"nbconvert_exporter": "python",
139+
"pygments_lexer": "ipython3",
140+
"version": "3.14.3"
141+
}
142+
},
143+
"nbformat": 4,
144+
"nbformat_minor": 5
145+
}

doc/requirements.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,18 @@ channels:
66
dependencies:
77
- python 3.14.*
88
- python *
9+
- dask-core *
10+
- distributed *
11+
- msgpack-python *
12+
- pyyaml *
13+
- netcdf4 *
14+
- scipy *
15+
- h5netcdf *
16+
- zarr *
917
- pip *
1018
- sphinx *
1119
- sphinx_rtd_theme *
20+
- myst-nb *
1221
- numpy *
1322
- pandas *
1423
- xarray *

0 commit comments

Comments
 (0)