Skip to content

Commit d6e4c53

Browse files
committed
docs: verify data types + storage docs against source
- Add missing RAY_F32=6 to type table (was skipped between I64=5 and F64=7) - Fix type-of examples to use meta builtin (type-of does not exist) - Fix BOOL vector output: [true false true] not [1b 0b 1b] - Fix ray_read_csv_opts signature: (path, delim, header, col_types, n_types) - Fix CSV date inference docs: only YYYY-MM-DD supported, not YYYY.MM.DD - Replace non-existent splay-save/splay-load/part-load Rayfall examples with info boxes noting these are C API only - Replace part-load Rayfall example in collections with C API equivalent
1 parent 71f1bd5 commit d6e4c53

3 files changed

Lines changed: 30 additions & 28 deletions

File tree

website/docs/data-types-collections.html

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ <h2 id="vectors">Vectors</h2>
121121

122122
<span class="hl-comment">; BOOL vector</span>
123123
<span class="hl-comment">ray&gt;</span> [<span class="hl-bool">true false true</span>]
124-
[1b 0b 1b]</code></pre>
124+
[true false true]</code></pre>
125125

126126
<h3 id="morsel-iteration">Morsel Iteration</h3>
127127
<p>All vector processing in Rayforce happens in <strong>morsels</strong> &mdash; fixed-size chunks of 1024 elements. The executor never processes an entire column at once. Instead, it iterates morsel by morsel, which keeps data in L1/L2 cache and enables pipeline parallelism.</p>
@@ -349,12 +349,11 @@ <h3>Type Encoding</h3>
349349
<h3>MAPCOMMON</h3>
350350
<p>When loading a date-partitioned table, Rayforce creates a virtual <code>RAY_MAPCOMMON</code> column. This column does not store actual data &mdash; it derives values from the partition directory names (e.g., <code>2024.01.15/</code>). Each row in a partition shares the same date value, so the MAPCOMMON column can represent millions of rows with zero per-row storage.</p>
351351

352-
<pre><code><span class="hl-comment">; Load a date-partitioned table</span>
353-
<span class="hl-comment">ray&gt;</span> (<span class="hl-kw">set</span> trades (<span class="hl-fn">part-load</span> <span class="hl-str">"db"</span> <span class="hl-str">"trades"</span>))
352+
<pre><code><span class="hl-comment">// C API: load a date-partitioned table</span>
353+
<span class="hl-type">ray_t</span>* trades = <span class="hl-fn">ray_part_load</span>(<span class="hl-str">"db"</span>, <span class="hl-str">"trades"</span>);
354354

355-
<span class="hl-comment">; The 'date' column is MAPCOMMON — derived from directory names</span>
356-
<span class="hl-comment">; Queries that filter on date trigger partition pruning</span>
357-
<span class="hl-comment">ray&gt;</span> (<span class="hl-kw">select</span> {<span class="hl-sym">from:</span>trades <span class="hl-sym">where:</span> (<span class="hl-fn">=</span> date <span class="hl-num">2024.01.15</span>)})</code></pre>
355+
<span class="hl-comment">// The 'date' column is MAPCOMMON — derived from directory names</span>
356+
<span class="hl-comment">// Queries that filter on date trigger partition pruning</span></code></pre>
358357

359358
<div class="info-box">
360359
<strong>Partition pruning:</strong> The optimizer recognizes filters on MAPCOMMON columns and eliminates entire partitions from the scan &mdash; skipping their memory-mapped segments entirely. A query filtering on a single date in a year of data reads only 1/365th of the files.

website/docs/data-types.html

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,14 @@ <h2 id="type-table">Type Reference</h2>
159159
<td><code>INT64_MIN</code></td>
160160
<td><code><span class="hl-num">42</span></code></td>
161161
</tr>
162+
<tr>
163+
<td><strong>Single float</strong></td>
164+
<td><code>RAY_F32</code></td>
165+
<td>6</td>
166+
<td>4 bytes</td>
167+
<td><code>NaN</code></td>
168+
<td>&mdash;</td>
169+
</tr>
162170
<tr>
163171
<td><strong>Double float</strong></td>
164172
<td><code>RAY_F64</code></td>
@@ -239,11 +247,11 @@ <h2 id="atoms-vectors">Atoms vs Vectors</h2>
239247
<span class="hl-comment">ray&gt;</span> [<span class="hl-num">1 2 3</span>]
240248
[1 2 3]
241249

242-
<span class="hl-comment">; Check with type-of</span>
243-
<span class="hl-comment">ray&gt;</span> (<span class="hl-fn">type-of</span> <span class="hl-num">42</span>)
244-
'I64
245-
<span class="hl-comment">ray&gt;</span> (<span class="hl-fn">type-of</span> [<span class="hl-num">1 2 3</span>])
246-
'I64</code></pre>
250+
<span class="hl-comment">; Check with meta</span>
251+
<span class="hl-comment">ray&gt;</span> (<span class="hl-fn">meta</span> <span class="hl-num">42</span>)
252+
{type:i64}
253+
<span class="hl-comment">ray&gt;</span> (<span class="hl-fn">meta</span> [<span class="hl-num">1 2 3</span>])
254+
{type:I64 len:3}</code></pre>
247255

248256
<h2 id="ray-t-header">The ray_t Header</h2>
249257
<p>Every Rayforce object begins with a 32-byte <code>ray_t</code> header. This is the fundamental building block &mdash; atoms, vectors, lists, tables, and functions all start with this structure.</p>

website/docs/storage.html

Lines changed: 12 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -193,9 +193,9 @@ <h3>C API</h3>
193193
<span class="hl-comment">// Load table (columns are mmap'd)</span>
194194
<span class="hl-type">ray_t</span>* trades = <span class="hl-fn">ray_splay_load</span>(<span class="hl-str">"db/trades"</span>, <span class="hl-str">"db/sym"</span>);</code></pre>
195195

196-
<pre><code><span class="hl-comment">; Rayfall: save and load splayed tables</span>
197-
<span class="hl-comment">ray&gt;</span> (<span class="hl-fn">splay-save</span> t <span class="hl-str">"db/trades"</span>)
198-
<span class="hl-comment">ray&gt;</span> (<span class="hl-kw">set</span> trades (<span class="hl-fn">splay-load</span> <span class="hl-str">"db/trades"</span>))</code></pre>
196+
<div class="info-box">
197+
<strong>Note:</strong> Splayed table I/O is currently available only through the C API. There are no <code>splay-save</code> / <code>splay-load</code> Rayfall builtins yet.
198+
</div>
199199

200200
<h2 id="partitioned-tables">Date-Partitioned Tables</h2>
201201
<p>For large time-series datasets, Rayforce supports date-partitioned storage. Data is split into directories named by date, each containing a splayed table for that day's data.</p>
@@ -227,14 +227,9 @@ <h3>Loading Partitioned Data</h3>
227227
<span class="hl-comment">// - Parted columns (RAY_PARTED_BASE + base_type) for each data column</span>
228228
<span class="hl-comment">// - All segments are memory-mapped — no data copy</span></code></pre>
229229

230-
<pre><code><span class="hl-comment">; Rayfall: load partitioned table</span>
231-
<span class="hl-comment">ray&gt;</span> (<span class="hl-kw">set</span> trades (<span class="hl-fn">part-load</span> <span class="hl-str">"db"</span> <span class="hl-str">"trades"</span>))
232-
233-
<span class="hl-comment">; Filter on date — optimizer prunes partitions</span>
234-
<span class="hl-comment">ray&gt;</span> (<span class="hl-kw">select</span> {<span class="hl-sym">from:</span>trades <span class="hl-sym">where:</span> (<span class="hl-fn">=</span> date <span class="hl-num">2024.01.15</span>)})
235-
236-
<span class="hl-comment">; Range filter — only relevant partitions are scanned</span>
237-
<span class="hl-comment">ray&gt;</span> (<span class="hl-kw">select</span> {<span class="hl-sym">from:</span>trades <span class="hl-sym">where:</span> (<span class="hl-fn">and</span> (<span class="hl-fn">&gt;=</span> date <span class="hl-num">2024.01.15</span>) (<span class="hl-fn">&lt;=</span> date <span class="hl-num">2024.01.17</span>))})</code></pre>
230+
<div class="info-box">
231+
<strong>Note:</strong> Partitioned table loading is currently available only through the C API (<code>ray_part_load</code>). There is no <code>part-load</code> Rayfall builtin yet. Once loaded via the C API, the resulting table supports normal <code>select</code> queries with partition pruning.
232+
</div>
238233

239234
<h3>Partition Pruning</h3>
240235
<p>The query optimizer recognizes predicates on the <code>MAPCOMMON</code> column and eliminates entire partitions from the scan plan. This means a query filtering on a single date in a year of data only touches 1/365th of the files on disk &mdash; with zero per-row cost for the pruned partitions.</p>
@@ -313,11 +308,11 @@ <h3>C API</h3>
313308
<tbody>
314309
<tr>
315310
<td><code>ray_read_csv(path)</code></td>
316-
<td>Load a CSV file with default options: comma delimiter, first row as header, automatic type inference, <code>""</code> as null.</td>
311+
<td>Load a CSV file with default options: comma delimiter, first row as header, and automatic type inference. Empty fields are treated as null.</td>
317312
</tr>
318313
<tr>
319-
<td><code>ray_read_csv_opts(path, delim, header, null_str)</code></td>
320-
<td>Load with custom options: delimiter character, whether first row is a header, and null string representation.</td>
314+
<td><code>ray_read_csv_opts(path, delim, header, col_types, n_types)</code></td>
315+
<td>Load with custom options: delimiter character, whether first row is a header, explicit column type array (<code>int8_t*</code>), and number of type entries. Pass <code>NULL, 0</code> for automatic type inference.</td>
321316
</tr>
322317
<tr>
323318
<td><code>ray_write_csv(table, path)</code></td>
@@ -329,8 +324,8 @@ <h3>C API</h3>
329324
<pre><code><span class="hl-comment">// Default options</span>
330325
<span class="hl-type">ray_t</span>* data = <span class="hl-fn">ray_read_csv</span>(<span class="hl-str">"trades.csv"</span>);
331326

332-
<span class="hl-comment">// Tab-delimited, no header, "NA" as null</span>
333-
<span class="hl-type">ray_t</span>* tsv = <span class="hl-fn">ray_read_csv_opts</span>(<span class="hl-str">"data.tsv"</span>, <span class="hl-str">'\t'</span>, <span class="hl-bool">false</span>, <span class="hl-str">"NA"</span>);
327+
<span class="hl-comment">// Tab-delimited, no header, auto type inference</span>
328+
<span class="hl-type">ray_t</span>* tsv = <span class="hl-fn">ray_read_csv_opts</span>(<span class="hl-str">"data.tsv"</span>, <span class="hl-str">'\t'</span>, <span class="hl-bool">false</span>, NULL, <span class="hl-num">0</span>);
334329

335330
<span class="hl-comment">// Write results back</span>
336331
<span class="hl-fn">ray_write_csv</span>(result, <span class="hl-str">"output.csv"</span>);</code></pre>
@@ -341,7 +336,7 @@ <h3>Type Inference</h3>
341336
<li><strong>BOOL</strong> &mdash; <code>true</code>/<code>false</code>, <code>1</code>/<code>0</code></li>
342337
<li><strong>I64</strong> &mdash; integer values within 64-bit range</li>
343338
<li><strong>F64</strong> &mdash; floating-point values</li>
344-
<li><strong>DATE</strong> &mdash; <code>YYYY.MM.DD</code> or <code>YYYY-MM-DD</code> format</li>
339+
<li><strong>DATE</strong> &mdash; <code>YYYY-MM-DD</code> format (hyphen-separated)</li>
345340
<li><strong>TIMESTAMP</strong> &mdash; date + time with nanosecond precision</li>
346341
<li><strong>SYM</strong> &mdash; short repeated strings (auto-interned as symbols)</li>
347342
<li><strong>STR</strong> &mdash; fallback for everything else</li>

0 commit comments

Comments
 (0)