Skip to content

Commit 636c1fb

Browse files
committed
Saving revisions
- Added new section about time manipulation tools
1 parent bd69814 commit 636c1fb

1 file changed

Lines changed: 77 additions & 31 deletions

File tree

src/ezmiller/relaunching_tablecloth_time.clj

Lines changed: 77 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,9 @@
2222

2323
;; I recently relaunched an old Scicloj project called [tablecloth.time](https://github.com/scicloj/tablecloth.time). The goal of this project was to build a composable
2424
;; extension for time series analysis built on top of
25-
;; [tablecloth](https://scicloj.github.io/tablecloth/). Originally, we
25+
;; [tablecloth](https://scicloj.github.io/tablecloth/). Throughout this post,
26+
;; `tct` refers to `tablecloth.time.api` and `tc` refers to `tablecloth.api`.
27+
;; Originally, we
2628
;; had built this project around a dataset index mechanism that was
2729
;; built into tech.ml.dataset, but after that feature was removed in
2830
;; v7, the project required a rethink. This post walks through that
@@ -31,11 +33,12 @@
3133

3234
;; ## Why No Index?
3335
;;
34-
;; The original tablecloth.time was built around an index — set a time column
35-
;; as your dataset's index, and operations like `slice` and `resample` would
36-
;; work implicitly on it, just like Pandas. This seemed necessary for two reasons:
37-
;; performance (tree-based indexes offer O(log n) lookups) and convenience
38-
;; (you don't have to keep specifying which column is the time column).
36+
;; The original tablecloth.time was built around an index two reasons:
37+
;; performance (tree-based indexes offer O(log n) lookups) and
38+
;; convenience
39+
;; (you don't have to keep specifying which column is the time
40+
;; column). Anyone who has used the Python Pandas data processing
41+
;; library is likely familiar with this feature.
3942
;;
4043
;; But when tech.ml.dataset removed its indexing mechanism in v7, it forced a
4144
;; rethink. And the rethink revealed that neither rationale held up.
@@ -61,6 +64,8 @@
6164
;; [Composability Over Abstraction](https://humanscodes.com/tablecloth-time-relaunch)
6265
;; on humanscodes.
6366

67+
;; Now let's dig into this library's primitivs and basic functionality.
68+
6469
;; ## Loading the Data
6570
;;
6671
;; We'll use the `vic_elec` dataset: half-hourly electricity demand from Victoria,
@@ -80,24 +85,29 @@
8085
;;
8186
;; The first primitive is `add-time-columns`. It extracts temporal fields from a
8287
;; datetime column — day-of-week, month, hour, etc. — as new columns you can
83-
;; group or filter on.
88+
;; group or filter on. Here's a quick look at what it produces:
8489

8590
(-> vic-elec
8691
(tct/add-time-columns :Time [:day-of-week :hour])
8792
(tc/head 10))
8893

89-
;; With these extracted fields, standard tablecloth operations give you resampling
90-
;; and aggregation patterns. Let's compute average demand by day of week:
94+
(def demand-by-day
95+
(-> vic-elec
96+
(tct/add-time-columns :Time [:day-of-week])
97+
(tc/group-by [:day-of-week])
98+
(tc/aggregate {:Demand #(dfn/mean (:Demand %))})
99+
(tc/order-by [:day-of-week])))
91100

92-
(-> vic-elec
93-
(tct/add-time-columns :Time [:day-of-week])
94-
(tc/group-by [:day-of-week])
95-
(tc/aggregate {:Demand #(dfn/mean (:Demand %))})
96-
(tc/order-by [:day-of-week])
97-
(plotly/layer-bar {:=x :day-of-week :=y :Demand}))
101+
;; Look at the aggregated data:
102+
(tc/head demand-by-day 7)
103+
104+
;; Step 2: Visualize the result:
105+
(plotly/layer-bar demand-by-day
106+
{:=x :day-of-week :=y :Demand})
98107

99108
;; Weekends (days 6 and 7) clearly have lower demand. The `:day-of-week` field
100-
;; came from `add-time-columns`; the aggregation is pure tablecloth.
109+
;; came from `add-time-columns`; the group-by, aggregate, and order-by are pure
110+
;; tablecloth. The two libraries compose seamlessly.
101111

102112
;; ## Slicing Time Ranges
103113
;;
@@ -142,14 +152,23 @@
142152
;; The tight diagonal shows strong positive correlation — demand at any given
143153
;; time is highly predictive of demand at the same time the previous day.
144154

155+
;; `add-lead` shifts values forward — current Demand aligns with Demand 24 hours
156+
;; ahead. Let's see if today's demand predicts tomorrow's:
157+
158+
(-> vic-elec
159+
(tct/add-lead :Demand 48 :Demand_lead48)
160+
(tc/drop-missing)
161+
(plotly/layer-point {:=x :Demand
162+
:=y :Demand_lead48
163+
:=mark-opacity 0.3}))
164+
145165
;; ## Resampling as a Pattern
146166
;;
147-
;; tablecloth.time doesn't have a `resample` function (yet). Instead, resampling
148-
;; is a composable pattern: extract the time component you want, group by it,
149-
;; and aggregate.
150-
;;
151-
;; Daily averages (grouping by year, month, and day):
167+
;; We showed the resampling pattern above: extract time fields, group, aggregate,
168+
;; order. The same pattern scales to different granularities. Here are daily and
169+
;; monthly averages using the same building blocks:
152170

171+
;; Daily averages:
153172
(-> vic-elec
154173
(tct/add-time-columns :Time [:year :month :day])
155174
(tc/group-by [:year :month :day])
@@ -159,15 +178,18 @@
159178
(tc/head 10))
160179

161180
;; Monthly averages:
162-
163181
(-> vic-elec
164182
(tct/add-time-columns :Time [:year :month])
165183
(tc/group-by [:year :month])
166184
(tc/aggregate {:Demand #(dfn/mean (:Demand %))})
167185
(tc/order-by [:year :month])
168186
(plotly/layer-bar {:=x :month :=y :Demand :=color :year}))
169187

170-
;; Each step is visible. Each step composes with the rest of your pipeline.
188+
;; Note that tablecloth.time is just a light layer in these
189+
;; expressions. You could do this with tablecloth alone by manually
190+
;; extracting datetime components. tablecloth.time's add-time-columns
191+
;; just adds concision and expressiveness — it composes naturally with
192+
;; the tablecloth operations.
171193

172194
;; ## Combining Primitives
173195
;;
@@ -187,17 +209,41 @@
187209
;; Weekday demand shows the classic two-peak pattern (morning and evening),
188210
;; while weekend demand is flatter and lower overall.
189211

190-
;; ## What's Next
212+
;; ## Time Utilities (Column API)
213+
;;
214+
;; tablecloth.time mirrors tablecloth's structure: a dataset API (`tct`)
215+
;; and a column API (`tablecloth.time.column.api`). The column API provides
216+
;; lower-level utilities for working with time data directly — parsing,
217+
;; conversion, flooring, extraction. These power the high-level functions
218+
;; and are available when you need finer control.
191219
;;
192-
;; tablecloth.time is experimental. The current release provides these focused
193-
;; primitives:
220+
;; **Parsing** — `tablecloth.time.parse/parse` handles ISO-8601 strings and
221+
;; custom formats with cached formatters for performance.
194222
;;
195-
;; - `add-time-columns` — extract temporal fields
196-
;; - `slice` — select time ranges efficiently
197-
;; - `add-lag` / `add-lead` — shift values for autocorrelation
223+
;; **Conversion** — `convert-time` moves between representations (Instants,
224+
;; LocalDateTimes, LocalDates, epoch milliseconds) with timezone awareness.
225+
;;
226+
;; **Flooring** — `down-to-nearest`, `floor-to-month`, `floor-to-quarter` bucket
227+
;; timestamps to intervals. Useful for aggregating sub-daily data:
228+
229+
(require '[tablecloth.time.column.api :as tctc])
230+
231+
(-> vic-elec
232+
(tc/add-column :HourBucket
233+
#(tctc/down-to-nearest (% :Time) 1 :hours {:zone "UTC"}))
234+
(tc/head 5))
235+
236+
;; The column API parallels `tablecloth.column.api` — work with columns
237+
;; directly, then add them to your dataset. The high-level dataset functions
238+
;; are convenience wrappers built from these pieces. Manipulating time data
239+
;; is notoriously fiddly; tablecloth.time tries to smooth the sharp edges
240+
;; without hiding the underlying java.time power.
241+
242+
;; ## What's Next
198243
;;
199-
;; Planned additions include rolling windows, differencing, and higher-level
200-
;; patterns like `resample` that wrap the composable building blocks.
244+
;; tablecloth.time is experimental. Planned additions include rolling windows,
245+
;; differencing, and higher-level patterns like `resample` that wrap the
246+
;; composable building blocks.
201247
;;
202248
;; The [repository is on GitHub](https://github.com/scicloj/tablecloth.time).
203249
;; For more worked examples, see the

0 commit comments

Comments
 (0)