Skip to content

Commit e136360

Browse files
bbengfortlooselycoupled
authored andcommitted
Add btrdb explained to the docs (#58)
1 parent 04bfc74 commit e136360

6 files changed

Lines changed: 145 additions & 7 deletions

File tree

docs/source/working/images/multiprocessing_architecture.png renamed to docs/source/_static/figures/multiprocessing_architecture.png

File renamed without changes.
5.58 MB
Loading

docs/source/conf.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,3 +209,5 @@
209209

210210
# If true, `todo` and `todoList` produce output, else they produce nothing.
211211
todo_include_todos = True
212+
213+
numfig = True

docs/source/explained.rst

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
BTrDB Explained
2+
===============
3+
4+
*The Berkeley Tree DataBase (BTrDB) is pronounced* "**Better DB**".
5+
6+
**A next-gen timeseries database for dense, streaming telemetry.**
7+
8+
**Problem**: Existing timeseries databases are poorly equipped for a new generation of ultra-fafst sensor telemetry. Specifically, millions of high-precision power meters are to be deployed through the power grid to help analyze and prevent blackouts. Thus, new software must be built to facilitate the storage and analysis of its data.
9+
10+
**Baseline**: We need 1.4M inserts/second and 5x that in reads if we are to support 1000 `micro-synchrophasors`_ per server node. No timeseries database can do this.
11+
12+
.. _micro-synchrophasors: https://arxiv.org/abs/1605.02813
13+
14+
Summary
15+
-------
16+
17+
**Goals**: Develop a multi-resolution storage and query engine for many 100+ Hz streams at nanosecond precision—and operate at the full line rate of underlying network or storage infrastructure for affordable cluster sizes (less than six).
18+
19+
Developed at The University of California Berkeley, BTrDB offers new ways to support the aforementioned high throughput demands and allows efficient querying over large ranges.
20+
21+
**Fast writes/reads**
22+
23+
Measured on a 4-node cluster (large EC2 nodes):
24+
25+
- 53 million inserted values per second
26+
- 119 million queried values per second
27+
28+
**Fast analysis**
29+
30+
In under *200ms*, it can query a year of data at nanosecond-precision (2.1
31+
trillion points) at any desired window—returning statistical summary points at any
32+
desired resolution (containing a min/max/mean per point).
33+
34+
.. _zoom:
35+
.. figure:: /_static/figures/ui_zoom.gif
36+
:alt: Rapid zoom into timeseries data via plotter UI
37+
38+
BTrDB enables rapid timeseries queries to support analyses that zoom from years of data to nanosecond granularity smoothly, similar to how you might zoom into a street level view on Google Maps.
39+
40+
**High compression**
41+
42+
Data is compressed by 2.93x—a significant improvement for high-precision nanosecond streams. To achieve this, a modified version of *run-length encoding* was created to encode the *jitter* of delta values rather than the delta values themselves. Incidentally, this outperforms the popular audio codec `FLAC`_ which was the original inspiration for this technique.
43+
44+
.. _FLAC: https://xiph.org/flac/
45+
46+
**Efficient Versioning**
47+
48+
Data is version-annotated to allow queries of data as it existed at a certain time. This allows reproducible query results that might otherwise change due to newer realtime data coming in. Structural sharing of data between versions is done to make this process as efficient as possible.
49+
50+
The Tree Structure
51+
------------------
52+
53+
BTrDB stores its data in a time-partitioned tree.
54+
55+
All nodes represent a given time slot. A node can describe all points within its time slot at a resolution corresponding to its depth in the tree.
56+
57+
The root node covers ~146 years. With a branching factor of 64, bottom nodes at ten levels down cover 4ns each.
58+
59+
===== ================ =================
60+
level node width time granularity
61+
===== ================ =================
62+
1 2\ :sup:`62` ns ~146 years
63+
2 2\ :sup:`56` ns ~2.28 years
64+
3 2\ :sup:`50` ns ~13.03 days
65+
4 2\ :sup:`44` ns ~4.88 hours
66+
5 2\ :sup:`38` ns ~4.58 minutes
67+
6 2\ :sup:`32` ns ~4.29 seconds
68+
7 2\ :sup:`26` ns ~67.11 ms
69+
8 2\ :sup:`20` ns ~1.05 ms
70+
9 2\ :sup:`14` ns ~16.38 µs
71+
10 2\ :sup:`8` ns 256 ns
72+
11 2\ :sup:`2` ns 4 ns
73+
===== ================ =================
74+
75+
A node starts as a **vector node**, storing raw points in a vector of size 1024.
76+
This is considered a leaf node, since it does not point to any child nodes.::
77+
78+
┌─────────────────────────────────────────────────────────────────┐
79+
│ │
80+
│ VECTOR NODE │
81+
│ (holds 1024 raw points) │
82+
│ │
83+
├─────────────────────────────────────────────────────────────────┤
84+
│ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . │ <- raw points
85+
└─────────────────────────────────────────────────────────────────┘
86+
87+
Once this vector is full and more points need to be inserted into its time slot, the node is converted to a **core node** by time-partitioning itself into 64 "statistical" points.::
88+
89+
┌─────────────────────────────────────────────────────────────────┐
90+
│ │
91+
│ CORE NODE │
92+
│ (holds 64 statistical points) │
93+
│ │
94+
├─────────────────────────────────────────────────────────────────┤
95+
│ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ │ <- stat points
96+
└─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┘
97+
▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ <- child node pointers
98+
99+
A **statistical point** represents a 1/64 slice of its parent's time slot. It holds the min/max/mean/count of all points inside its time slot, and points to a new node holding extra details. When a vector node is first converted to a core node, the raw points are pushed into new vector nodes pointed to by the new statistical points.
100+
101+
===== ============================== ============================== =================== ===================
102+
level node width stat point width total nodes total stat points
103+
===== ============================== ============================== =================== ===================
104+
1 2\ :sup:`62` ns (~146 years) 2\ :sup:`56` ns (~2.28 years) 2\ :sup:`0` nodes 2\ :sup:`6` points
105+
2 2\ :sup:`56` ns (~2.28 years) 2\ :sup:`50` ns (~13.03 days) 2\ :sup:`6` nodes 2\ :sup:`12` points
106+
3 2\ :sup:`50` ns (~13.03 days) 2\ :sup:`44` ns (~4.88 hours) 2\ :sup:`12` nodes 2\ :sup:`18` points
107+
4 2\ :sup:`44` ns (~4.88 hours) 2\ :sup:`38` ns (~4.58 min) 2\ :sup:`18` nodes 2\ :sup:`24` points
108+
5 2\ :sup:`38` ns (~4.58 min) 2\ :sup:`32` ns (~4.29 s) 2\ :sup:`24` nodes 2\ :sup:`30` points
109+
6 2\ :sup:`32` ns (~4.29 s) 2\ :sup:`26` ns (~67.11 ms) 2\ :sup:`30` nodes 2\ :sup:`36` points
110+
7 2\ :sup:`26` ns (~67.11 ms) 2\ :sup:`20` ns (~1.05 ms) 2\ :sup:`36` nodes 2\ :sup:`42` points
111+
8 2\ :sup:`20` ns (~1.05 ms) 2\ :sup:`14` ns (~16.38 µs) 2\ :sup:`42` nodes 2\ :sup:`48` points
112+
9 2\ :sup:`14` ns (~16.38 µs) 2\ :sup:`8` ns (256 ns) 2\ :sup:`48` nodes 2\ :sup:`54` points
113+
10 2\ :sup:`8` ns (256 ns) 2\ :sup:`2` ns (4 ns) 2\ :sup:`54` nodes 2\ :sup:`60` points
114+
11 2\ :sup:`2` ns (4 ns) 2\ :sup:`60` nodes
115+
===== ============================== ============================== =================== ===================
116+
117+
The sampling rate of the data at different moments will determine how deep the tree will be during those slices of time. Regardless of the depth of the actual data, the time spent querying at some higher level (lower resolution) will remain fixed (quick) due to summaries provided by parent nodes.
118+
119+
...
120+
121+
Appendix
122+
--------
123+
The original version of this page can be found at:
124+
125+
- `github.com/PingThingsIO/btrdb-explained <https://github.com/PingThingsIO/btrdb-explained>`_
126+
127+
This page is written based on the following sources:
128+
129+
- `Homepage <http://btrdb.io/>`_
130+
- `Whitepaper <https://www.usenix.org/system/files/conference/fast16/fast16-papers-andersen.pdf>`_
131+
- `Code <https://github.com/BTrDB/btrdb-server>`_

docs/source/index.rst

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
Welcome to btrdb docs!
2-
========================================
2+
======================
33

44
.. image:: https://img.shields.io/travis/BTrDB/btrdb-python/master.svg
55
:target: https://travis-ci.org/BTrDB/btrdb-python
@@ -79,10 +79,11 @@ in Github!
7979
installing
8080
concepts
8181
working
82+
explained
8283

8384

8485
API Reference
85-
--------------------
86+
-------------
8687

8788
.. toctree::
8889
:maxdepth: 2
@@ -97,7 +98,7 @@ API Reference
9798

9899

99100
Maintainers
100-
--------------------
101+
-----------
101102

102103
* :doc:`maintainers/anaconda`
103104

@@ -106,7 +107,7 @@ Maintainers
106107

107108

108109
Indices and tables
109-
--------------------
110+
------------------
110111

111112
* :ref:`genindex`
112113
* :ref:`modindex`

docs/source/working/multiprocessing.rst

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,15 @@ Complex analytics in Python may require additional speedups that can be gained b
1313

1414
The first and most critical thing to note is that ``btrdb.Connection`` objects *are not thread- or multiprocess-safe*. This means that in your code you should use either a lock or a semaphore to share a single connection object or that each process or thread should create their own connection object and clean up after themselves when they are done using the connection. Moreover, because of the forking issue discribed in the warning above, you must also take care when to create connections in worker processes.
1515

16-
Let's take the following simple example: we want to perform a data quality analysis on 12 hour chunks of data for all the streams in our ``staging/sensors`` collection. If we have hundreds of sensor streams across many months, this job can be sped up dramatically by using multiprocessing. To do this, let's consider the following process architecture:
16+
Let's take the following simple example: we want to perform a data quality analysis on 12 hour chunks of data for all the streams in our ``staging/sensors`` collection. If we have hundreds of sensor streams across many months, this job can be sped up dramatically by using multiprocessing. Instead of having a single process churning through the each chunk of data one at a time, several workers can process multiple data chunks simultanously using multiple CPU cores and taking advantage of other CPU scheduling optimizations.
1717

18-
.. image:: images/multiprocessing_architecture.png
18+
.. _architecture:
19+
.. figure:: /_static/figures/multiprocessing_architecture.png
20+
:alt: a multiprocessing architecture
1921

20-
At first glance, this architecture looks similar to the one used by ``multiprocessing.Pool``, which is true. However, consider the following code:
22+
A two queue multiprocessing architecture for data parallel processing.
23+
24+
Consider the processing architecture shown in :numref:`architecture`. At first glance, this architecture looks similar to the one used by ``multiprocessing.Pool``, which is true. However, consider the following code:
2125

2226
.. code-block:: python
2327

0 commit comments

Comments
 (0)