|
| 1 | +TODO: Introduce the document |
| 2 | + |
| 3 | +### Structure |
| 4 | +The data structure of the sortition pool is a root node, which is a single |
| 5 | +`uint256`, followed by 6 layers of branch nodes, followed by a leaf layer. Each |
| 6 | +branch node is a `uint256`, and the branch structure is implemented as a |
| 7 | +`mapping(uint256 => mapping(uint256 => uint256))`. The first index represents |
| 8 | +the layer, and the second index represents the position within the layer. |
| 9 | + |
| 10 | +The first branch layer has 8 branches, and every subsequent layer has 8x the |
| 11 | +number of branches as the previous layer, so each layer consists of the |
| 12 | +following number of nodes: |
| 13 | + |
| 14 | +1) node layer: 1 |
| 15 | +2) branch layer 1: 8 |
| 16 | +3) branch layer 2: 64 |
| 17 | +4) branch layer 3: 512 |
| 18 | +5) branch layer 4: 4,096 |
| 19 | +6) branch layer 5: 32,768 |
| 20 | +7) branch layer 6: 262,144 |
| 21 | +8) leaf layer: 2,097,152 |
| 22 | + |
| 23 | +### Leaf Serialization and Deserialization |
| 24 | + |
| 25 | +Here's the leaf constructor with annotations to reference where sections are |
| 26 | +discussed. |
| 27 | + |
| 28 | +``` |
| 29 | +function make( |
| 30 | + address _operator, |
| 31 | + uint256 _creationBlock, |
| 32 | + uint256 _id |
| 33 | +) internal pure returns (uint256) { |
| 34 | + assert(_creationBlock <= type(uint64).max); |
| 35 | + assert(_id <= type(uint32).max); |
| 36 | + uint256 op = uint256(bytes32(bytes20(_operator))); // (1) |
| 37 | + uint256 uid = _id & ID_MAX; // (2) |
| 38 | + uint256 cb = (_creationBlock & BLOCKHEIGHT_MAX) << ID_WIDTH; // (3) |
| 39 | + return (op | cb | uid); // (4) |
| 40 | +} |
| 41 | +``` |
| 42 | + |
| 43 | +The `Leaf.make` takes in an operator address, the block it was created in, and |
| 44 | +a monotonically increasing id for unique operators. |
| 45 | + |
| 46 | +(1) We convert the operator address to a `bytes20`, and then convert it to a |
| 47 | +`bytes32` (which pads the extra 12 bytes with 0's on the *right*), and then |
| 48 | +convert the whole thing into a `uint256`. This means that we have 160 bits that |
| 49 | +matter on the left, and 96 0's on the right stored in `op`. |
| 50 | + |
| 51 | +(2) `ID_MAX = 2^32 - 1` is represented as 32 1's, which as a `uint256` is 224 0's |
| 52 | +followed by 32 1's. The bitwise `&` clears out everything but the last 32 |
| 53 | +significant bits of `_id`, leaving us with 224 0's and 32 significant bits |
| 54 | +stored in `uid`. |
| 55 | + |
| 56 | +Note: Since we assert that `_id` fits within `uint32.max`, the above bitwise |
| 57 | +`&` *shouldn't* ever do anything, but we do it anyway out of an abundance of |
| 58 | +caution. |
| 59 | + |
| 60 | +(3) `BLOCKHEIGHT_MAX = 2^64 - 1` is represented as 64 1s, which as a `uint256` is |
| 61 | +192 0's followed by 64 1's. the bitwise `&` clears out everything but the last |
| 62 | +64 significant bits of `_creationBlock`, and then we shift those bits to the |
| 63 | +left by `ID_WIDTH = 32` bits. This gives us 160 0's, 64 significant bits, and |
| 64 | +32 zeros stored in `cb`. |
| 65 | + |
| 66 | +(4) Then, we run `op | cb`. `op` has 160 significant bits on the left, and 96 0s on |
| 67 | +the right, and `cb` has 160 0s on the left, 64 significant bits, and then 32 |
| 68 | +zeros. The intermediate result has 224 significant bits followed by 32 0's with |
| 69 | +no collisions so far. We combine that result with `| uid`, which has 224 0s, |
| 70 | +and 32 significant bits, leaving us with 256 significant bits and no |
| 71 | +collisions. |
| 72 | + |
| 73 | +The resulting number is meaningless! Rather, it is only useful as a storage |
| 74 | +mechanism. If we want to know the operator we can look at the first 160 bits or |
| 75 | +the last 32 bits. If we want to know the creation block, we should look at bits |
| 76 | +`[160, 224)` (which we can do by right-shifting and then applying a bitwise |
| 77 | +`&`). |
| 78 | + |
| 79 | +### Branch And Root Deserialization And Serialization |
| 80 | + |
| 81 | +The branch and root nodes are a `uint256` divided into 8 virtual "slots", where |
| 82 | +each slot is given 32 sequential bits. The leftmost 32 bits is slot 8, and |
| 83 | +represents the 8th child, and the rightmost 32 bits is slot 1, and represents |
| 84 | +the 1st child. |
| 85 | + |
| 86 | +Say that we have 8 leaf nodes with the following names, weights, and 32-bit |
| 87 | +binary representations of those weights: |
| 88 | +``` |
| 89 | +Alice, 58123, 00000000000000001110001100001011 |
| 90 | +Bob, 19234, 00000000000000000100101100100010 |
| 91 | +Carol, 374974, 00000000000001011011100010111110 |
| 92 | +David, 55766, 00000000000000001101100111010110 |
| 93 | +Erin, 611237, 00000000000010010101001110100101 |
| 94 | +Frank, 38663, 00000000000000001001011100000111 |
| 95 | +Gretta, 427810, 00000000000001101000011100100010 |
| 96 | +Harold, 232974, 00000000000000111000111000001110 |
| 97 | +``` |
| 98 | + |
| 99 | +Then the tree looks like: |
| 100 | +``` |
| 101 | + ┌─────────────────────────────┐ |
| 102 | + │0000000000000011100011100000 │ |
| 103 | + │1110000000000000011010000111 │ |
| 104 | + │0010001000000000000000001001 │ |
| 105 | + │0111000001110000000000001001 │ |
| 106 | + │0101001110100101000000000000 │ |
| 107 | + ┌──────┌──────┌───────┌─┤0000110110011101011000000000 ├┐ |
| 108 | + │ │ │ │ │0000010110111000101111100000 ││ |
| 109 | + │ │ │ │ │0000000000000100101100100010 ││ |
| 110 | + │ │ │ │ │0000000000000000111000110000 ││ |
| 111 | + │ │ │ │ │1011 ││ |
| 112 | + │ │ │ │ └─────┬──────┬────────┬───────┘│ |
| 113 | +┌──┴───┐┌─┴──┐┌──┴───┐┌──┴───┐┌──┴──┐┌──┴───┐┌───┴───┐┌───┴───┐ |
| 114 | +│Alice ││Bob ││Carol ││David ││Erin ││Frank ││Gretta ││Harold │ |
| 115 | +└──────┘└────┘└──────┘└──────┘└─────┘└──────┘└───────┘└───────┘ |
| 116 | +``` |
| 117 | + |
| 118 | +The weight of that branch node is the *sum* of all of the slots, eg `58123 + |
| 119 | +19234 + 374974 + 55766 + 611237 + 38663 + 427810 + 232974 = 1818781`. |
| 120 | +Represented in 32-bit binary that's |
| 121 | + |
| 122 | +``` |
| 123 | +00000000000110111100000010011101 |
| 124 | +``` |
| 125 | +Which is what would go in the associated slot of this node's parent. This |
| 126 | +pattern recurses until we reach the root node. |
| 127 | + |
| 128 | +In order to read a particular slot, we right-shift until the 32 bits are the |
| 129 | +right-most 32 bits, and then do a bitwise `&` with `2^32 - 1`, which is 224 0's |
| 130 | +and 32 1s. This has the effect of erasing everything but the 32 relevant bits, |
| 131 | +allowing us to read *only* the slot. |
| 132 | + |
| 133 | +### Joining and Leaving The Pool |
| 134 | + |
| 135 | +Operators join the pool according to two pieces of state: `rightmostLeaf` and |
| 136 | +`emptyLeaves`. `rightmostLeaf` is an always-increasing counter that starts at 0 |
| 137 | +and increases each time an operator needs to join the pool. Eventually, |
| 138 | +`rightmostLeaf` will exceed `2,097,152`, which is the size of the leaf layer, |
| 139 | +and we'll ignore it forever, and instead rely on `emptyLeaves`. |
| 140 | + |
| 141 | +`emptyLeaves` is an array that is appended to with the position of the operator |
| 142 | +who *leaves* the pool. Once `rightmostLeaf` is useless, we `pop` `emptyLeaves`, |
| 143 | +insert the new operator in that position, and repeat. We should never run out |
| 144 | +of positions with this strategy because the number of leaves far exceeds the |
| 145 | +total token supply divided by the minimum stake. |
| 146 | + |
| 147 | +Here's a sample event log with state, using a max leaf length of `4` instead of |
| 148 | +`2097152` for brevity. |
| 149 | + |
| 150 | +``` |
| 151 | +state 1) slots: [-, -, -, -], rightmostLeaf: 0, emptyLeaves: [] |
| 152 | +event: A joins |
| 153 | +state 2) slots: [A, -, -, -], rightmostLeaf: 1, emptyLeaves: [] |
| 154 | +event: B joins |
| 155 | +state 3) slots: [A, B, -, -], rightmostLeaf: 2, emptyLeaves: [] |
| 156 | +event: C joins |
| 157 | +state 4) slots: [A, B, C, -], rightmostLeaf: 3, emptyLeaves: [] |
| 158 | +event: B leaves |
| 159 | +state 5) slots: [A, -, C, -], rightmostLeaf: 3, emptyLeaves: [1] |
| 160 | +event: D joins |
| 161 | +state 6) slots: [A, -, C, D], rightmostLeaf: 4, emptyLeaves: [1] // rightmostLeaf is forever useless now |
| 162 | +event: A leaves |
| 163 | +state 7) slots: [-, -, C, D], rightmostLeaf: 4, emptyLeaves: [1, 0] |
| 164 | +event: E joins |
| 165 | +state 8) slots: [E, -, C, D], rightmostLeaf: 4, emptyLeaves: [1] |
| 166 | +``` |
| 167 | + |
| 168 | +Each time an operator joins or leaves the pool, we need to update all of the |
| 169 | +branches on the path from the operator to the root, as well as the root. The |
| 170 | +branch with the operator as a child will have its slot updated with the child's |
| 171 | +weight directly. That branch's total weight will change, which will update it's |
| 172 | +slot in that branch's parent, and so on, all the way up to the root. |
| 173 | + |
| 174 | +For an in-depth explanation of how this information is structured, refer to the |
| 175 | +[Branch and Root Deserialization and |
| 176 | +Serialization](#branch-and-root-deserialization-and-serialization) section. |
| 177 | + |
| 178 | +TODO: Joining and Leaving The Pool |
| 179 | + |
| 180 | +### Selecting A Random Group |
| 181 | + |
| 182 | +To select a random group of of operators from the pool size `N`, we construct |
| 183 | +an array of size `N` to house the result, and then populate it, one random |
| 184 | +operator at a time, with replacement. |
| 185 | + |
| 186 | +We start with a seed provided from the [random |
| 187 | +beacon](https://github.com/keep-network/keep-core/tree/main/solidity/random-beacon) |
| 188 | +and use that seed in combination with the total weight of the root node, |
| 189 | +calculatable by summing the weight of the slots: see [Branch And Root |
| 190 | +Deserialization And |
| 191 | +Serialization](#branch-and-root-deserialization-and-serialization) to generate |
| 192 | +a random uniform integer in `[0, root.totalWeight)`, as well as the next random |
| 193 | +seed for the next random operator. In order to generate this next random seed, |
| 194 | +we're using |
| 195 | +``` |
| 196 | +newState = keccak256(abi.encodePacked(newState, address(this))); |
| 197 | +``` |
| 198 | +which, according to [A Pseudorandom Number Generator with KECCAK Hash Function |
| 199 | +by A. Gholipour and S. Mirzakuchak](http://www.ijcee.org/papers/439-JE503.pdf), |
| 200 | +has "excellent pseudo randomness". |
| 201 | + |
| 202 | +Once we have our random integer, we are able to descend down the tree according |
| 203 | +to the algorithm outlined in [building intuition](building-intuition.md). |
| 204 | + |
| 205 | +``` |
| 206 | +function pickWeightedLeaf(uint256 index, uint256 _root) |
| 207 | + internal |
| 208 | + view |
| 209 | + returns (uint256 leafPosition) |
| 210 | +{ |
| 211 | + uint256 currentIndex = index; |
| 212 | + uint256 currentNode = _root; |
| 213 | + uint256 currentPosition = 0; |
| 214 | + uint256 currentSlot; |
| 215 | +
|
| 216 | + require(index < currentNode.sumWeight(), "Index exceeds weight"); |
| 217 | +
|
| 218 | + // get root slot |
| 219 | + (currentSlot, currentIndex) = currentNode.pickWeightedSlot(currentIndex); |
| 220 | +
|
| 221 | + // get slots from levels 2 to 7 |
| 222 | + for (uint256 level = 2; level <= LEVELS; level++) { |
| 223 | + currentPosition = currentPosition.child(currentSlot); |
| 224 | + currentNode = branches[level][currentPosition]; |
| 225 | + (currentSlot, currentIndex) = currentNode.pickWeightedSlot(currentIndex); |
| 226 | + } |
| 227 | +
|
| 228 | + // get leaf position |
| 229 | + leafPosition = currentPosition.child(currentSlot); |
| 230 | +} |
| 231 | +
|
| 232 | +function pickWeightedSlot(uint256 node, uint256 index) |
| 233 | + internal |
| 234 | + pure |
| 235 | + returns (uint256 slot, uint256 newIndex) |
| 236 | +{ |
| 237 | + unchecked { |
| 238 | + newIndex = index; |
| 239 | + uint256 newNode = node; |
| 240 | + uint256 currentSlotWeight = newNode & SLOT_MAX; |
| 241 | + while (newIndex >= currentSlotWeight) { |
| 242 | + newIndex -= currentSlotWeight; |
| 243 | + slot++; |
| 244 | + newNode = newNode >> SLOT_WIDTH; |
| 245 | + currentSlotWeight = newNode & SLOT_MAX; |
| 246 | + } |
| 247 | + return (slot, newIndex); |
| 248 | + } |
| 249 | +} |
| 250 | +``` |
| 251 | + |
| 252 | +At a particular root/branch node, we inspect the right-most 32 bits by applying |
| 253 | +a bitwise `&` against `2^32 - 1`, which leaves only the last 32 bits as |
| 254 | +potentially non-zero. |
| 255 | + |
| 256 | +If this quantity if greater than our random number, we found our path of |
| 257 | +descent and repeat the process at the next layer. If it isn't, then we increase |
| 258 | +our slot counter, decrease our random number by the quantity, and shift our |
| 259 | +number over 32 bits to the right and repeat. |
| 260 | + |
| 261 | +Eventually we will find a slot that exceeds our random number and be able to |
| 262 | +descend to the next layer where the process repeats until we get to the leaf |
| 263 | +layer where the task is finished. |
| 264 | + |
| 265 | +We record our chosen operator in the result list, use the fresh seed from |
| 266 | +`keccak256` to generate a new random number and new seed, and repeat until we |
| 267 | +have a full group. |
0 commit comments