Skip to content

Commit f726a43

Browse files
committed
Revise index section to concisely explain the rethink
- Rename 'Design Philosophy' to 'Why No Index?' - Explain original index approach and why it seemed necessary - Concisely cover both performance and convenience rationales - Explain why neither held up with immutable datasets - Link to humanscodes post for full discussion
1 parent 5c58485 commit f726a43

1 file changed

Lines changed: 27 additions & 16 deletions

File tree

src/ezmiller/relaunching_tablecloth_time.clj

Lines changed: 27 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -29,22 +29,33 @@
2929
;; rethink and the projects core core primitives today using the
3030
;; Victoria electricity demand dataset.
3131

32-
;; ## Design Philosophy
33-
;;
34-
;; If you've used Pandas for time series work, you're familiar with the index-based
35-
;; approach: set a DateTimeIndex, and operations like slicing and resampling work
36-
;; implicitly on it. It's convenient, but it's also hidden state threaded through
37-
;; your data.
38-
;;
39-
;; tablecloth.time takes a different path. Following tablecloth's design — and
40-
;; Clojure's preference for explicit, composable operations — you always specify
41-
;; which column you're working with. Each function takes the data and the columns
42-
;; it operates on. The pipeline reads like what it does.
43-
;;
44-
;; This isn't a compromise. As Chris Nuernberger (author of tech.ml.dataset) noted,
45-
;; with immutable datasets that get rebuilt on each transformation, a tree-based
46-
;; index offers no performance advantage over binary search on sorted data. The
47-
;; simplicity is the feature.
32+
;; ## Why No Index?
33+
;;
34+
;; The original tablecloth.time was built around an index — set a time column
35+
;; as your dataset's index, and operations like `slice` and `resample` would
36+
;; work implicitly on it, just like Pandas. This seemed necessary for two reasons:
37+
;; performance (tree-based indexes offer O(log n) lookups) and convenience
38+
;; (you don't have to keep specifying which column is the time column).
39+
;;
40+
;; But when tech.ml.dataset removed its indexing mechanism in v7, it forced a
41+
;; rethink. And the rethink revealed that neither rationale held up.
42+
;;
43+
;; **On performance:** Unlike Python DataFrames, Clojure's datasets are immutable.
44+
;; They're rebuilt on each transformation. Under these conditions, maintaining a
45+
;; tree-based index is pure overhead — you'd rebuild it constantly. As Chris
46+
;; Nuernberger (author of tech.ml.dataset) put it: "Just sorting the dataset and
47+
;; using binary search will outperform most/all tree structures in this scenario."
48+
;;
49+
;; **On convenience:** The index adds implicit state threaded through your data.
50+
;; Tablecloth's API avoids this — you always say which columns you're operating on.
51+
;; The pipeline reads like what it does. This aligns with Clojure's broader preference
52+
;; for explicit, composable operations over hidden magic.
53+
;;
54+
;; The simplicity isn't a compromise. It's the feature.
55+
;;
56+
;; For the full discussion of this design shift, see
57+
;; [Composability Over Abstraction](https://humanscodes.com/tablecloth-time-relaunch)
58+
;; on humanscodes.
4859

4960
;; ## Loading the Data
5061
;;

0 commit comments

Comments
 (0)