|
22 | 22 |
|
23 | 23 | ;; I recently relaunched an old Scicloj project called [tablecloth.time](https://github.com/scicloj/tablecloth.time). The goal of this project was to build a composable |
24 | 24 | ;; extension for time series analysis built on top of |
25 | | -;; [tablecloth](https://scicloj.github.io/tablecloth/). Originally, we |
| 25 | +;; [tablecloth](https://scicloj.github.io/tablecloth/). Throughout this post, |
| 26 | +;; `tct` refers to `tablecloth.time.api` and `tc` refers to `tablecloth.api`. |
| 27 | +;; Originally, we |
26 | 28 | ;; had built this project around a dataset index mechanism that was |
27 | 29 | ;; built into tech.ml.dataset, but after that feature was removed in |
28 | 30 | ;; v7, the project required a rethink. This post walks through that |
|
31 | 33 |
|
32 | 34 | ;; ## Why No Index? |
33 | 35 | ;; |
34 | | -;; The original tablecloth.time was built around an index — set a time column |
35 | | -;; as your dataset's index, and operations like `slice` and `resample` would |
36 | | -;; work implicitly on it, just like Pandas. This seemed necessary for two reasons: |
37 | | -;; performance (tree-based indexes offer O(log n) lookups) and convenience |
38 | | -;; (you don't have to keep specifying which column is the time column). |
| 36 | +;; The original tablecloth.time was built around an index two reasons: |
| 37 | +;; performance (tree-based indexes offer O(log n) lookups) and |
| 38 | +;; convenience |
| 39 | +;; (you don't have to keep specifying which column is the time |
| 40 | +;; column). Anyone who has used the Python Pandas data processing |
| 41 | +;; library is likely familiar with this feature. |
39 | 42 | ;; |
40 | 43 | ;; But when tech.ml.dataset removed its indexing mechanism in v7, it forced a |
41 | 44 | ;; rethink. And the rethink revealed that neither rationale held up. |
|
61 | 64 | ;; [Composability Over Abstraction](https://humanscodes.com/tablecloth-time-relaunch) |
62 | 65 | ;; on humanscodes. |
63 | 66 |
|
| 67 | +;; Now let's dig into this library's primitivs and basic functionality. |
| 68 | + |
64 | 69 | ;; ## Loading the Data |
65 | 70 | ;; |
66 | 71 | ;; We'll use the `vic_elec` dataset: half-hourly electricity demand from Victoria, |
|
80 | 85 | ;; |
81 | 86 | ;; The first primitive is `add-time-columns`. It extracts temporal fields from a |
82 | 87 | ;; datetime column — day-of-week, month, hour, etc. — as new columns you can |
83 | | -;; group or filter on. |
| 88 | +;; group or filter on. Here's a quick look at what it produces: |
84 | 89 |
|
85 | 90 | (-> vic-elec |
86 | 91 | (tct/add-time-columns :Time [:day-of-week :hour]) |
87 | 92 | (tc/head 10)) |
88 | 93 |
|
89 | | -;; With these extracted fields, standard tablecloth operations give you resampling |
90 | | -;; and aggregation patterns. Let's compute average demand by day of week: |
| 94 | +(def demand-by-day |
| 95 | + (-> vic-elec |
| 96 | + (tct/add-time-columns :Time [:day-of-week]) |
| 97 | + (tc/group-by [:day-of-week]) |
| 98 | + (tc/aggregate {:Demand #(dfn/mean (:Demand %))}) |
| 99 | + (tc/order-by [:day-of-week]))) |
91 | 100 |
|
92 | | -(-> vic-elec |
93 | | - (tct/add-time-columns :Time [:day-of-week]) |
94 | | - (tc/group-by [:day-of-week]) |
95 | | - (tc/aggregate {:Demand #(dfn/mean (:Demand %))}) |
96 | | - (tc/order-by [:day-of-week]) |
97 | | - (plotly/layer-bar {:=x :day-of-week :=y :Demand})) |
| 101 | +;; Look at the aggregated data: |
| 102 | +(tc/head demand-by-day 7) |
| 103 | + |
| 104 | +;; Step 2: Visualize the result: |
| 105 | +(plotly/layer-bar demand-by-day |
| 106 | + {:=x :day-of-week :=y :Demand}) |
98 | 107 |
|
99 | 108 | ;; Weekends (days 6 and 7) clearly have lower demand. The `:day-of-week` field |
100 | | -;; came from `add-time-columns`; the aggregation is pure tablecloth. |
| 109 | +;; came from `add-time-columns`; the group-by, aggregate, and order-by are pure |
| 110 | +;; tablecloth. The two libraries compose seamlessly. |
101 | 111 |
|
102 | 112 | ;; ## Slicing Time Ranges |
103 | 113 | ;; |
|
142 | 152 | ;; The tight diagonal shows strong positive correlation — demand at any given |
143 | 153 | ;; time is highly predictive of demand at the same time the previous day. |
144 | 154 |
|
| 155 | +;; `add-lead` shifts values forward — current Demand aligns with Demand 24 hours |
| 156 | +;; ahead. Let's see if today's demand predicts tomorrow's: |
| 157 | + |
| 158 | +(-> vic-elec |
| 159 | + (tct/add-lead :Demand 48 :Demand_lead48) |
| 160 | + (tc/drop-missing) |
| 161 | + (plotly/layer-point {:=x :Demand |
| 162 | + :=y :Demand_lead48 |
| 163 | + :=mark-opacity 0.3})) |
| 164 | + |
145 | 165 | ;; ## Resampling as a Pattern |
146 | 166 | ;; |
147 | | -;; tablecloth.time doesn't have a `resample` function (yet). Instead, resampling |
148 | | -;; is a composable pattern: extract the time component you want, group by it, |
149 | | -;; and aggregate. |
150 | | -;; |
151 | | -;; Daily averages (grouping by year, month, and day): |
| 167 | +;; We showed the resampling pattern above: extract time fields, group, aggregate, |
| 168 | +;; order. The same pattern scales to different granularities. Here are daily and |
| 169 | +;; monthly averages using the same building blocks: |
152 | 170 |
|
| 171 | +;; Daily averages: |
153 | 172 | (-> vic-elec |
154 | 173 | (tct/add-time-columns :Time [:year :month :day]) |
155 | 174 | (tc/group-by [:year :month :day]) |
|
159 | 178 | (tc/head 10)) |
160 | 179 |
|
161 | 180 | ;; Monthly averages: |
162 | | - |
163 | 181 | (-> vic-elec |
164 | 182 | (tct/add-time-columns :Time [:year :month]) |
165 | 183 | (tc/group-by [:year :month]) |
166 | 184 | (tc/aggregate {:Demand #(dfn/mean (:Demand %))}) |
167 | 185 | (tc/order-by [:year :month]) |
168 | 186 | (plotly/layer-bar {:=x :month :=y :Demand :=color :year})) |
169 | 187 |
|
170 | | -;; Each step is visible. Each step composes with the rest of your pipeline. |
| 188 | +;; Note that tablecloth.time is just a light layer in these |
| 189 | +;; expressions. You could do this with tablecloth alone by manually |
| 190 | +;; extracting datetime components. tablecloth.time's add-time-columns |
| 191 | +;; just adds concision and expressiveness — it composes naturally with |
| 192 | +;; the tablecloth operations. |
171 | 193 |
|
172 | 194 | ;; ## Combining Primitives |
173 | 195 | ;; |
|
187 | 209 | ;; Weekday demand shows the classic two-peak pattern (morning and evening), |
188 | 210 | ;; while weekend demand is flatter and lower overall. |
189 | 211 |
|
190 | | -;; ## What's Next |
| 212 | +;; ## Time Utilities (Column API) |
| 213 | +;; |
| 214 | +;; tablecloth.time mirrors tablecloth's structure: a dataset API (`tct`) |
| 215 | +;; and a column API (`tablecloth.time.column.api`). The column API provides |
| 216 | +;; lower-level utilities for working with time data directly — parsing, |
| 217 | +;; conversion, flooring, extraction. These power the high-level functions |
| 218 | +;; and are available when you need finer control. |
191 | 219 | ;; |
192 | | -;; tablecloth.time is experimental. The current release provides these focused |
193 | | -;; primitives: |
| 220 | +;; **Parsing** — `tablecloth.time.parse/parse` handles ISO-8601 strings and |
| 221 | +;; custom formats with cached formatters for performance. |
194 | 222 | ;; |
195 | | -;; - `add-time-columns` — extract temporal fields |
196 | | -;; - `slice` — select time ranges efficiently |
197 | | -;; - `add-lag` / `add-lead` — shift values for autocorrelation |
| 223 | +;; **Conversion** — `convert-time` moves between representations (Instants, |
| 224 | +;; LocalDateTimes, LocalDates, epoch milliseconds) with timezone awareness. |
| 225 | +;; |
| 226 | +;; **Flooring** — `down-to-nearest`, `floor-to-month`, `floor-to-quarter` bucket |
| 227 | +;; timestamps to intervals. Useful for aggregating sub-daily data: |
| 228 | + |
| 229 | +(require '[tablecloth.time.column.api :as tctc]) |
| 230 | + |
| 231 | +(-> vic-elec |
| 232 | + (tc/add-column :HourBucket |
| 233 | + #(tctc/down-to-nearest (% :Time) 1 :hours {:zone "UTC"})) |
| 234 | + (tc/head 5)) |
| 235 | + |
| 236 | +;; The column API parallels `tablecloth.column.api` — work with columns |
| 237 | +;; directly, then add them to your dataset. The high-level dataset functions |
| 238 | +;; are convenience wrappers built from these pieces. Manipulating time data |
| 239 | +;; is notoriously fiddly; tablecloth.time tries to smooth the sharp edges |
| 240 | +;; without hiding the underlying java.time power. |
| 241 | + |
| 242 | +;; ## What's Next |
198 | 243 | ;; |
199 | | -;; Planned additions include rolling windows, differencing, and higher-level |
200 | | -;; patterns like `resample` that wrap the composable building blocks. |
| 244 | +;; tablecloth.time is experimental. Planned additions include rolling windows, |
| 245 | +;; differencing, and higher-level patterns like `resample` that wrap the |
| 246 | +;; composable building blocks. |
201 | 247 | ;; |
202 | 248 | ;; The [repository is on GitHub](https://github.com/scicloj/tablecloth.time). |
203 | 249 | ;; For more worked examples, see the |
|
0 commit comments