posit-dev
diff --git a/‎doc/syntax/index.qmd‎
Lines changed: 1 addition & 0 deletions b/‎doc/syntax/index.qmd‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎doc/syntax/layer/type/smooth.qmd‎
Lines changed: 153 additions & 0 deletions b/‎doc/syntax/layer/type/smooth.qmd‎
Lines changed: 153 additions & 0 deletions
diff --git a/‎doc/syntax/layer/type/violin.qmd‎
Lines changed: 10 additions & 0 deletions b/‎doc/syntax/layer/type/violin.qmd‎
Lines changed: 10 additions & 0 deletions
@@ -33,6 +33,7 @@ There are many different layers to choose from when visualising your data. Some
 - [`histogram`](layer/type/histogram.qmd) bins the data along the x axis and produces a bar for each bin showing the number of records in it.
 - [`boxplot`](layer/type/boxplot.qmd) displays continuous variables as 5-number summaries.
 - [`errorbar`](layer/type/errorbar.qmd) a line segment with hinges at the endpoints.
+- [`smooth`](layer/type/smooth.qmd) a trendline that follows the data shape.
 
 ### Position adjustments
 - [`stack`](layer/position/stack.qmd) places objects with a shared baseline on top of each other.
 
@@ -0,0 +1,153 @@
+---
+title: "Smooth"
+---
+
+> Layers are declared with the [`DRAW` clause](../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it.
+
+Smooth layers are used to display a trendline among a series of observations.
+
+## Aesthetics
+
+### Required
+* Primary axis (e.g. `x`): Position along the primary axis.
+* Secondary axis (e.g. `y`): Position along the secondary axis.
+
+### Optional
+* `colour`/`stroke`: The colour of the line
+* `opacity`: The opacity of the line
+* `linewidth`: The width of the line
+* `linetype`: The type of line, i.e. the dashing pattern
+
+## Settings
+
+* `method`: Choice of the method for generating the trendline. One of the following:
+    * `'nw'` or `'nadaraya-watson'` estimates the trendline using the Nadaraya-Watson kernel regression method (default).
+    * `'ols'` estimates a straight trendline using ordinary least squares method.
+    * `'tls'` estimates a straight trendline using total least squares method.
+
+The settings below only apply when `method => 'nw'` and are ignored when using other methods.
+* `bandwidth`: A numerical value setting the smoothing bandwidth to use. If absent (default), the bandwidth will be computed using Silverman's rule of thumb. 
+* `adjust`: A numerical value as multiplier for the `bandwidth` setting, with 1 as default.
+* `kernel`: Determines the smoothing kernel shape. Can be one of the following:
+    * `'gaussian'` (default)
+    * `'epanechnikov'`
+    * `'triangular'`
+    * `'rectangular'` or `'uniform'`
+    * `'biweight'` or `'quartic'`
+    * `'cosine'`
+
+## Data transformation
+
+### Nadaraya-Watson kernel regression
+
+The default `method => 'nw'` computes a locally weighted average of $y$.
+
+$$
+y(x) = \frac{\sum_{i=1}^nW(x)y_i}{\sum_{i=1}^nW(x)}
+$$
+
+Where:
+
+* $W(x)$ is kernel intensity $w_iK(\frac{x - x_i}{h})$ where
+   * $K$ is the kernel function
+   * $h$ is the bandwidth
+   * $w_i$ is the weight of observation $i$
+
+Please note the similarity of $W(x)$ to the [kernel density estimation formula](density.qmd#data-transformation).
+
+### Ordinary least squares
+
+The `method => 'ols'` setting uses ordinary least squares to compute the intercept $a$ and slope $b$ of a straight line.
+The method minimizes the 1-dimensional distance between a point and the vertical projection of that point on the line.
+Only considering the vertical distances implies having measurement error in $y$, but not $x$.
+
+$$
+y = a + bx
+$$
+
+Wherein:
+
+$$
+a = E[Y] - bE[X]
+$$
+
+and
+
+$$
+b = \frac{\text{cov}(X, Y)}{\text{var}(X)} = \frac{E[XY] - E[X]E[Y]}{E[X^2]-(E[X])^2}
+$$
+
+### Total least squares
+
+The `method => 'tls'` setting uses total least squares to compute the intercept $a$ and slope $b$ of a straight line.
+The method minimizes the 2-dimensiontal distance between a point and the perpendicular projection of that point on the line.
+Minimising the perpendicular distances (rather than just the vertical distances) makes sense if there is uncertainty or measurement error in not just $y$, but in $x$ as well.
+In such case, it is a more accurate depiction of the relationship between $x$ and $y$, but it isn't the best predictor of $y$ given $x$.
+
+$$
+y = a + bx
+$$
+
+Wherein:
+
+$$
+a = E[Y] - bE[X]
+$$
+
+and
+
+$$
+b = \frac{\text{var}(Y) - \text{var}(X) + \sqrt{(\text{var}(Y) - \text{var}(X))^2 + 4\text{cov}(X, Y)^2}}{2\text{cov}(X, Y)}
+$$
+
+### Properties
+
+* `weight` is available when using `method => 'nw'`, where when mapped, it sets the relative contribution of an observation $w_i$ to the average.
+
+### Calculated statistics
+
+* `intensity` corresponds to $y$ in the formulas described in the [data transformation](#data-transformation) section.
+
+### Default remappings
+
+* `intensity AS y`: By default the smooth layer will display the $y$ in the formulas along the y-axis.
+
+## Examples
+
+The default `method => 'nw'` might be too coarse for timeseries.
+
+<!-- Ideally, we would just use the date here directly but we currently require numeric data -->
+
+```{ggsql}
+SELECT *, EPOCH(Date) AS numdate FROM ggsql:airquality
+VISUALISE numdate AS x, Temp AS y
+  DRAW point
+  DRAW smooth
+```
+
+You can make the fit more granular by reducing the bandwidth, for example using `adjust`.
+
+```{ggsql}
+SELECT *, EPOCH(Date) AS numdate FROM ggsql:airquality
+VISUALISE numdate AS x, Temp AS y
+  DRAW point
+  DRAW smooth SETTING adjust => 0.2
+```
+
+There is a subtle difference between the ordinary and total least squares method.
+
+```{ggsql}
+VISUALISE bill_len AS x, bill_dep AS y FROM ggsql:penguins
+   DRAW point
+   DRAW smooth MAPPING 'Ordinary' AS colour SETTING method => 'ols'
+   DRAW smooth MAPPING 'Total' AS colour SETTING method => 'tls'
+```
+
+Simpson's Paradox is a case where a trend of combined groups is reversed when groups are considered separately.
+
+```{ggsql}
+VISUALISE bill_len AS x, bill_dep AS y, species AS stroke FROM ggsql:penguins
+   DRAW point SETTING opacity => 0
+   DRAW smooth SETTING method => 'ols'
+   DRAW smooth MAPPING 'All' AS stroke SETTING method => 'ols'
+```
@@ -34,6 +34,9 @@ The following aesthetics are recognised by the violin layer.
     * `'biweight'` or `'quartic'`
     * `'cosine'`
 * `width`: Relative width of the violins. Defaults to `0.9`.
+* `tails`: Expansion rule for drawing the tails. One of the following:
+    * A number setting a multiple of adjusted bandwidths to expand each group's range. Defaults to 3.
+    * `null` to use the whole data range rather than group ranges.
 
 ## Data transformation
 A violin layer uses the same computation as a density layer. See the [density data transformation](density.qmd#data-transformation) section for details.
@@ -71,6 +74,13 @@ VISUALISE species AS x, bill_dep AS y FROM ggsql:penguins
   DRAW violin SETTING adjust => 0.1
 ```
 
+The `tails` setting controls the display beyond the data range. You can set it to `0` to use the exact group's data range.
+
+```{ggsql}
+VISUALISE species AS x, bill_dep AS y FROM ggsql:penguins
+  DRAW violin SETTING tails => 0
+```
+
 To more clearly indicate differences in group sizes, you can use the `intensity` computed variable.
 Note that we have fewer (n=68) Chinstrap penguins than Adelie (n=152) or Gentoo (n=124) penguins.