|
23 | 23 | ;; sufficiently low entropy data is not uniquely represented by fusing hashes |
24 | 24 | ;; together. |
25 | 25 |
|
26 | | -;; The good news is that low entropy data is compressible and a practical |
27 | | -;; fusing operator can throw an exception when fusing produces 'bad' hash |
28 | | -;; values. |
| 26 | +;; The good news is that low entropy data may always be converted into high |
| 27 | +;; entropy data either via compression or injecting noise. A practical fusing |
| 28 | +;; operator can easily detect fuses of low entropy data and throw an exception |
| 29 | +;; when fusing produces those 'bad' hash values. |
29 | 30 |
|
30 | 31 | ;; ## Introduction |
31 | 32 |
|
32 | 33 | ;; I have begun working on implementing distributed immutable data structures |
33 | 34 | ;; as a way to efficiently share structured data between multiple clients. One |
34 | 35 | ;; of the key challenges in this area is being able to efficiently reference |
35 | | -;; ordered collections of data. The usual approach is to use Merkle Trees. But, |
36 | | -;; since I plan on using Finger Trees to represent ordered collections of data, |
| 36 | +;; ordered collections of data. The usual approach is to use Merkle Trees which |
| 37 | +;; have the limitation that they are non-associative and the shape of the tree |
| 38 | +;; determines the final hash value at the root of the tree. But, since I plan |
| 39 | +;; on using Finger Trees to efficiently represent ordered collections of data, |
37 | 40 | ;; I need computed hashes that are insensitive to the tree shape. This means |
38 | 41 | ;; that I need to be able to fuse hashes together in a way that is associative |
39 | | -;; and not commutative. As a starting point, I have been reading the HP paper |
40 | | -;; describing hash fusing via matrix multiplication: |
| 42 | +;; and also non-commutative. As a starting point, I have been reading the HP |
| 43 | +;; paper describing hash fusing via matrix multiplication: |
41 | 44 | ;; [https://www.labs.hpe.com/techreports/2017/HPE-2017-08.pdf](https://www.labs.hpe.com/techreports/2017/HPE-2017-08.pdf) |
42 | 45 |
|
43 | 46 | ;; ## Hash Fusing via Upper Triangular Matrix Multiplication |
|
70 | 73 | (def c-hex (random-hex 32)) |
71 | 74 | ^:kindly/hide-code c-hex |
72 | 75 |
|
73 | | -;; To fuse two hashes, we convert each hash to its corresponding upper |
74 | | -;; triangular matrix and then multiply the two matrices together. The result is |
75 | | -;; another upper triangular matrix which can be converted back to a hash by |
76 | | -;; taking the elements above the main diagonal. The following several sections |
77 | | -;; defines this mapping between hashes and upper triangular matrices. For the |
| 76 | +;; To fuse two hashes, we convert each hash to an upper triangular matrix and |
| 77 | +;; then multiply the two matrices together. The result is another upper |
| 78 | +;; triangular matrix which can be converted back to a hash by taking the |
| 79 | +;; elements above the main diagonal. The following several sections defines |
| 80 | +;; this mapping between hashes and upper triangular matrices. For the |
78 | 81 | ;; experiments below four different bit sizes of cells and four corresponding |
79 | 82 | ;; matrices are defined. |
80 | 83 |
|
|
499 | 502 | ab*c (utm-multiply ab c) |
500 | 503 | a*bc (utm-multiply a bc)] |
501 | 504 | {:cell-size cell-size |
502 | | - :commutative? (= ab ba) |
503 | | - :associative? (= ab*c a*bc)})) |
| 505 | + :associative? (= ab*c a*bc) |
| 506 | + :commutative? (= ab ba)})) |
504 | 507 | (kind/table)) |
505 | 508 |
|
506 | 509 | ;; ## Experiment 1: Random Fuses |
|
693 | 696 | {:a a-hex :b b-hex :fused (utm64->hex ab)})) |
694 | 697 | (utm64->hex ab)))) |
695 | 698 |
|
| 699 | +;; Example of high entropy data fusing successfully: |
696 | 700 | (high-entropy-fuse a-hex b-hex) |
697 | 701 |
|
| 702 | +;; Example of low entropy data causing an error: |
698 | 703 | (try |
699 | 704 | (high-entropy-fuse zero-hex zero-hex) |
700 | 705 | (catch Exception e |
701 | 706 | (.getMessage e))) |
702 | 707 |
|
703 | 708 | ;; ## Appendix: Why Low Entropy Data Fails (AI Explanation) |
704 | 709 |
|
| 710 | +;; _The following explanation was generated by Copilot in response to the |
| 711 | +;; question: "Why does low entropy data fail to be represented by hash fusing |
| 712 | +;; via upper triangular matrix multiplication?"_ |
| 713 | + |
705 | 714 | ;; The reason low entropy data fails to be represented by hash fusing is that |
706 | 715 | ;; the multiplication of upper triangular matrices causes the lower bits of the |
707 | 716 | ;; resulting matrix to rapidly approach zero when the same matrix is repeatedly |
|
0 commit comments