|
| 1 | +^{:kindly/hide-code true |
| 2 | + :clay {:title "Getting Started with tablecloth.time" |
| 3 | + :quarto {:author [:ezmiller] |
| 4 | + :description "A composable approach to time series analysis in Clojure" |
| 5 | + :draft false |
| 6 | + :type :post |
| 7 | + :date "2026-03-27" |
| 8 | + :category :clojure |
| 9 | + :tags [:time-series :tablecloth :data-science]}}} |
| 10 | +(ns time-series.tablecloth-time |
| 11 | + (:require [tablecloth.api :as tc] |
| 12 | + [tablecloth.time.api :as tct] |
| 13 | + [scicloj.tableplot.v1.plotly :as plotly] |
| 14 | + [tech.v3.datatype.functional :as dfn] |
| 15 | + [scicloj.kindly.v4.kind :as kind])) |
| 16 | + |
| 17 | +;; [tablecloth.time](https://github.com/scicloj/tablecloth.time) is a composable |
| 18 | +;; extension for time series analysis built on top of |
| 19 | +;; [tablecloth](https://scicloj.github.io/tablecloth/). This post walks through |
| 20 | +;; its core primitives using the classic Victoria electricity demand dataset. |
| 21 | + |
| 22 | +;; ## Design Philosophy |
| 23 | +;; |
| 24 | +;; If you've used Pandas for time series work, you're familiar with the index-based |
| 25 | +;; approach: set a DateTimeIndex, and operations like slicing and resampling work |
| 26 | +;; implicitly on it. It's convenient, but it's also hidden state threaded through |
| 27 | +;; your data. |
| 28 | +;; |
| 29 | +;; tablecloth.time takes a different path. Following tablecloth's design — and |
| 30 | +;; Clojure's preference for explicit, composable operations — you always specify |
| 31 | +;; which column you're working with. Each function takes the data and the columns |
| 32 | +;; it operates on. The pipeline reads like what it does. |
| 33 | +;; |
| 34 | +;; This isn't a compromise. As Chris Nuernberger (author of tech.ml.dataset) noted, |
| 35 | +;; with immutable datasets that get rebuilt on each transformation, a tree-based |
| 36 | +;; index offers no performance advantage over binary search on sorted data. The |
| 37 | +;; simplicity is the feature. |
| 38 | + |
| 39 | +;; ## Loading the Data |
| 40 | +;; |
| 41 | +;; We'll use the `vic_elec` dataset: half-hourly electricity demand from Victoria, |
| 42 | +;; Australia, spanning 2012-2014. Let's load it and take a look. |
| 43 | + |
| 44 | +(def vic-elec |
| 45 | + (-> (tc/dataset "https://raw.githubusercontent.com/scicloj/tablecloth.time/main/data/vic_elec.csv" |
| 46 | + {:key-fn keyword}) |
| 47 | + (tc/convert-types :Time :instant))) |
| 48 | + |
| 49 | +(tc/head vic-elec) |
| 50 | + |
| 51 | +;; The dataset has half-hourly readings with `:Time`, `:Demand` (in MW), |
| 52 | +;; `:Temperature`, and other fields. |
| 53 | + |
| 54 | +;; ## Extracting Time Components |
| 55 | +;; |
| 56 | +;; The first primitive is `add-time-columns`. It extracts temporal fields from a |
| 57 | +;; datetime column — day-of-week, month, hour, etc. — as new columns you can |
| 58 | +;; group or filter on. |
| 59 | + |
| 60 | +(-> vic-elec |
| 61 | + (tct/add-time-columns :Time [:day-of-week :hour]) |
| 62 | + (tc/head 10)) |
| 63 | + |
| 64 | +;; With these extracted fields, standard tablecloth operations give you resampling |
| 65 | +;; and aggregation patterns. Let's compute average demand by day of week: |
| 66 | + |
| 67 | +(-> vic-elec |
| 68 | + (tct/add-time-columns :Time [:day-of-week]) |
| 69 | + (tc/group-by [:day-of-week]) |
| 70 | + (tc/aggregate {:Demand #(dfn/mean (:Demand %))}) |
| 71 | + (tc/order-by [:day-of-week]) |
| 72 | + (plotly/layer-bar {:=x :day-of-week :=y :Demand})) |
| 73 | + |
| 74 | +;; Weekends (days 6 and 7) clearly have lower demand. The `:day-of-week` field |
| 75 | +;; came from `add-time-columns`; the aggregation is pure tablecloth. |
| 76 | + |
| 77 | +;; ## Slicing Time Ranges |
| 78 | +;; |
| 79 | +;; `slice` selects rows within a time range using binary search on sorted data. |
| 80 | +;; It's fast even on large datasets. |
| 81 | + |
| 82 | +(-> vic-elec |
| 83 | + (tct/slice :Time "2012-01-09" "2012-01-15") |
| 84 | + (tc/row-count)) |
| 85 | + |
| 86 | +;; One week of data — 336 half-hourly observations. Let's visualize it: |
| 87 | + |
| 88 | +(-> vic-elec |
| 89 | + (tct/slice :Time "2012-01-09" "2012-01-15") |
| 90 | + (plotly/layer-line {:=x :Time :=y :Demand})) |
| 91 | + |
| 92 | +;; The daily oscillation is clearly visible: demand peaks during the day and |
| 93 | +;; drops at night. |
| 94 | + |
| 95 | +;; ## Lag and Lead Columns |
| 96 | +;; |
| 97 | +;; `add-lag` shifts column values by a fixed number of rows — useful for |
| 98 | +;; autocorrelation analysis. Note this is row-based, not time-aware: you need |
| 99 | +;; to know your data's frequency and calculate the offset. |
| 100 | +;; |
| 101 | +;; Since this dataset has half-hourly readings, a lag of 48 rows equals 24 hours: |
| 102 | + |
| 103 | +(-> vic-elec |
| 104 | + (tct/add-lag :Demand 48 :Demand_lag48) |
| 105 | + (tc/drop-missing) |
| 106 | + (tc/head 10)) |
| 107 | + |
| 108 | +;; Let's see if demand correlates with the same time yesterday: |
| 109 | + |
| 110 | +(-> vic-elec |
| 111 | + (tct/add-lag :Demand 48 :Demand_lag48) |
| 112 | + (tc/drop-missing) |
| 113 | + (plotly/layer-point {:=x :Demand_lag48 |
| 114 | + :=y :Demand |
| 115 | + :=mark-opacity 0.3})) |
| 116 | + |
| 117 | +;; The tight diagonal shows strong positive correlation — demand at any given |
| 118 | +;; time is highly predictive of demand at the same time the previous day. |
| 119 | + |
| 120 | +;; ## Resampling as a Pattern |
| 121 | +;; |
| 122 | +;; tablecloth.time doesn't have a `resample` function (yet). Instead, resampling |
| 123 | +;; is a composable pattern: extract the time component you want, group by it, |
| 124 | +;; and aggregate. |
| 125 | +;; |
| 126 | +;; Daily averages: |
| 127 | + |
| 128 | +(-> vic-elec |
| 129 | + (tct/add-time-columns :Time [:date]) |
| 130 | + (tc/group-by [:date]) |
| 131 | + (tc/aggregate {:Demand #(dfn/mean (:Demand %)) |
| 132 | + :Temperature #(dfn/mean (:Temperature %))}) |
| 133 | + (tc/order-by [:date]) |
| 134 | + (tc/head 10)) |
| 135 | + |
| 136 | +;; Monthly averages: |
| 137 | + |
| 138 | +(-> vic-elec |
| 139 | + (tct/add-time-columns :Time [:year :month]) |
| 140 | + (tc/group-by [:year :month]) |
| 141 | + (tc/aggregate {:Demand #(dfn/mean (:Demand %))}) |
| 142 | + (tc/order-by [:year :month]) |
| 143 | + (plotly/layer-bar {:=x :month :=y :Demand :=color :year})) |
| 144 | + |
| 145 | +;; Each step is visible. Each step composes with the rest of your pipeline. |
| 146 | + |
| 147 | +;; ## Combining Primitives |
| 148 | +;; |
| 149 | +;; Let's do something more interesting: analyze the daily demand profile, |
| 150 | +;; comparing weekdays to weekends. |
| 151 | + |
| 152 | +(-> vic-elec |
| 153 | + (tct/add-time-columns :Time [:day-of-week :hour]) |
| 154 | + (tc/map-columns :weekend? [:day-of-week] #(>= % 6)) |
| 155 | + (tc/group-by [:weekend? :hour]) |
| 156 | + (tc/aggregate {:Demand #(dfn/mean (:Demand %))}) |
| 157 | + (tc/order-by [:hour]) |
| 158 | + (plotly/layer-line {:=x :hour |
| 159 | + :=y :Demand |
| 160 | + :=color :weekend?})) |
| 161 | + |
| 162 | +;; Weekday demand shows the classic two-peak pattern (morning and evening), |
| 163 | +;; while weekend demand is flatter and lower overall. |
| 164 | + |
| 165 | +;; ## What's Next |
| 166 | +;; |
| 167 | +;; tablecloth.time is experimental. The current release provides these focused |
| 168 | +;; primitives: |
| 169 | +;; |
| 170 | +;; - `add-time-columns` — extract temporal fields |
| 171 | +;; - `slice` — select time ranges efficiently |
| 172 | +;; - `add-lag` / `add-lead` — shift values for autocorrelation |
| 173 | +;; |
| 174 | +;; Planned additions include rolling windows, differencing, and higher-level |
| 175 | +;; patterns like `resample` that wrap the composable building blocks. |
| 176 | +;; |
| 177 | +;; The [repository is on GitHub](https://github.com/scicloj/tablecloth.time). |
| 178 | +;; For more worked examples, see the |
| 179 | +;; [fpp3 Chapter 2 notebook](https://kingkongbot.github.io/tablecloth.time/chapter_02_time_series_graphics.html). |
0 commit comments