Skip to content

Commit 7f23445

Browse files
authored
Add position to syntax (#173)
1 parent b602c39 commit 7f23445

63 files changed

Lines changed: 5011 additions & 233 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ palette = "0.7"
6464
# Utilities
6565
regex = "1.10"
6666
chrono = "0.4"
67+
rand = "0.8"
6768
const_format = "0.2"
6869
uuid = { version = "1.0", features = ["v4"] }
6970

doc/_quarto.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,12 @@ website:
7979
href: syntax/clause/label.qmd
8080
- section: Layers
8181
contents:
82-
- auto: syntax/layer/*
82+
- section: Types
83+
contents:
84+
- auto: syntax/layer/type/*
85+
- section: Position adjustment
86+
contents:
87+
- auto: syntax/layer/position/*
8388
- section: Scales
8489
contents:
8590
- section: Types

doc/styles.scss

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,11 @@ code {
1919
font-variant-ligatures: none
2020
}
2121

22+
// Add spacing below rendered plots so text doesn't crowd them
23+
.cell-output-display {
24+
margin-bottom: 1.5rem;
25+
}
26+
2227
.hero-banner {
2328
padding: 0;
2429
margin: 0;

doc/syntax/clause/draw.qmd

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,9 @@ The `SETTING` clause can be used for to different things:
7272
* *Setting parameters*: Some layers take additional arguments that control how they behave. Often, but not always, these modify the statistical transformation in some way. An example would be the binwidth parameter in histogram which controls the width of each bin during histogram calculation. This is not a statistical property since it is not related to each record, but to the calculation as a whole.
7373
* *Setting aesthetics*: If you wish to set a specific aesthetic to a literal value, e.g. 'red' (as in the color red) then you can do so in the `SETTING` clause. Aesthetics that are set will not go through a scale but will use the provided value as-is. You cannot set an aesthetic to a column, only to a scalar literal value.
7474

75+
#### Position
76+
A special setting is `position` which controls how overlapping objects are repositioned to avoid overlapping etc. Position adjustments have special mapping requirements so all position adjustments will not be relevant for all layer types. Different layers have different defaults as detailed in their documentation. You can read about each different position adjustment at [their own documentation sites](../index.qmd#position-adjustments).
77+
7578
### `FILTER`
7679
```ggsql
7780
FILTER <condition>

doc/syntax/index.qmd

Lines changed: 23 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -15,27 +15,34 @@ ggsql augments the standard SQL syntax with a number of new clauses to describe
1515
## Layers
1616
There are many different layers to choose from when visualising your data. Some are straightforward translations of your data into visual marks such as a point layer, while others perform more or less complicated calculations like e.g. the histogram layer. A layer is selected by providing the layer name after the `DRAW` clause
1717

18-
- [`point`](layer/point.qmd) is used to create a scatterplot layer.
19-
- [`line`](layer/line.qmd) is used to produce lineplots with the data sorted along the x axis.
20-
- [`path`](layer/path.qmd) is like `line` above but does not sort the data but plot it according to its own order.
21-
- [`segment`](layer/segment.qmd) connects two points with a line segment.
22-
- [`linear`](layer/linear.qmd) draws a long line parameterised by a coefficient and intercept.
23-
- [`rule`](layer/rule.qmd) draws horizontal and vertical reference lines.
24-
- [`area`](layer/area.qmd) is used to display series as an area chart.
25-
- [`ribbon`](layer/ribbon.qmd) is used to display series extrema.
26-
- [`polygon`](layer/polygon.qmd) is used to display arbitrary shapes as polygons.
27-
- [`bar`](layer/bar.qmd) creates a bar chart, optionally calculating y from the number of records in each bar.
28-
- [`density`](layer/density.qmd) creates univariate kernel density estimates, showing the distribution of a variable.
29-
- [`violin`](layer/violin.qmd) displays a rotated kernel density estimate.
30-
- [`histogram`](layer/histogram.qmd) bins the data along the x axis and produces a bar for each bin showing the number of records in it.
31-
- [`boxplot`](layer/boxplot.qmd) displays continuous variables as 5-number summaries.
32-
- [`errorbar`](layer/errorbar.qmd) a line segment with hinges at the endpoints.
18+
### Layer types
19+
- [`point`](layer/type/point.qmd) is used to create a scatterplot layer.
20+
- [`line`](layer/type/line.qmd) is used to produce lineplots with the data sorted along the x axis.
21+
- [`path`](layer/type/path.qmd) is like `line` above but does not sort the data but plot it according to its own order.
22+
- [`segment`](layer/type/segment.qmd) connects two points with a line segment.
23+
- [`linear`](layer/type/linear.qmd) draws a long line parameterised by a coefficient and intercept.
24+
- [`rule`](layer/type/rule.qmd) draws horizontal and vertical reference lines.
25+
- [`area`](layer/type/area.qmd) is used to display series as an area chart.
26+
- [`ribbon`](layer/type/ribbon.qmd) is used to display series extrema.
27+
- [`polygon`](layer/type/polygon.qmd) is used to display arbitrary shapes as polygons.
28+
- [`bar`](layer/type/bar.qmd) creates a bar chart, optionally calculating y from the number of records in each bar.
29+
- [`density`](layer/type/density.qmd) creates univariate kernel density estimates, showing the distribution of a variable.
30+
- [`violin`](layer/type/violin.qmd) displays a rotated kernel density estimate.
31+
- [`histogram`](layer/type/histogram.qmd) bins the data along the x axis and produces a bar for each bin showing the number of records in it.
32+
- [`boxplot`](layer/type/boxplot.qmd) displays continuous variables as 5-number summaries.
33+
- [`errorbar`](layer/type/errorbar.qmd) a line segment with hinges at the endpoints.
34+
35+
### Position adjustments
36+
- [`stack`](layer/position/stack.qmd) places objects with a shared baseline on top of each other.
37+
- [`dodge`](layer/position/dodge.qmd) places objects that share the same discrete position side by side
38+
- [`jitter`](layer/position/jitter.qmd) adds a small random offset to objects sharing the same discrete position
39+
- [`identity`](layer/position/identity.qmd) does nothing, i.e. turns off position adjustment
3340

3441
## Scales
3542
A scale is responsible for translating a data value to an aesthetic literal, e.g. a specific color for the fill aesthetic, or a radius in points for the size aesthetic. A scale is a combination of a specific aesthetic and a scale type
3643

3744
### Aesthetics
38-
- [Position](scale/aesthetic/0_position.qmd) aesthetics are those aesthetics realted to the spatial location of the data in the coordinate system.
45+
- [Position](scale/aesthetic/0_position.qmd) aesthetics are those aesthetics related to the spatial location of the data in the coordinate system.
3946
- [Color](scale/aesthetic/1_color.qmd) aesthetics are related to the color of fill and stroke
4047
- [`opacity`](scale/aesthetic/2_opacity.qmd) is the aesthetic that determines the opacity of the color
4148
- [`linetype`](scale/aesthetic/linetype.qmd) governs the stroke pattern of strokes
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
---
2+
title: Dodge
3+
---
4+
5+
> Positions are set within the [`DRAW` clause](../../clause/draw.qmd), using the `SETTING` subclause. Read the documentation for this clause for a thorough description of how to use it.
6+
7+
The dodge adjustment is intended to move entities that share the same position on a discrete scale side by side so they don't overlap. It is most often used for boxplots and violin plots, but can also be used in e.g. bar plots as an alternative to [stacking](stack.qmd).
8+
9+
## Position scale requirements
10+
Dodge doesn't have specific requirements to the scale type of the plot, but will only affect discrete scales (including binned and ordinal). If only one scale is discrete, the dodging happens in that scale's direction. If both scales are discrete, the dodging happens as a 2D grid.
11+
12+
## Settings
13+
Apart from the settings of the layer type, setting `position => 'dodge'` will allow these additional settings:
14+
15+
* `width`: The total width the dodging will occupy as a proportion of the space available on the scale. Defaults to 0.9 but any defaults from the layer will take precedence.
16+
17+
## Examples
18+
19+
Dodging is default in boxplots (and violin plots)
20+
21+
```{ggsql}
22+
VISUALISE species AS x, bill_dep AS y, sex AS fill FROM ggsql:penguins
23+
DRAW boxplot
24+
```
25+
26+
Turning it off allows you to see the effect of it
27+
28+
```{ggsql}
29+
VISUALISE species AS x, bill_dep AS y, sex AS fill FROM ggsql:penguins
30+
DRAW boxplot SETTING position => 'identity'
31+
```
32+
33+
Dodge can be used for bar plots as an alternative to the default stack
34+
35+
```{ggsql}
36+
VISUALISE species AS x, island AS fill FROM ggsql:penguins
37+
DRAW bar SETTING position => 'dodge'
38+
```
39+
40+
Often `width` is part of the layer settings and gets used directly by the dodge position, but for layers with no inherent width setting dodge provides that setting as well
41+
42+
```{ggsql}
43+
VISUALISE species AS x, bill_dep AS y, sex AS shape FROM ggsql:penguins
44+
DRAW point SETTING position => 'dodge', width => 0.5
45+
```
46+
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
---
2+
title: Identity
3+
---
4+
5+
> Positions are set within the [`DRAW` clause](../../clause/draw.qmd), using the `SETTING` subclause. Read the documentation for this clause for a thorough description of how to use it.
6+
7+
The identity position is a position adjustment that does nothing, i.e. it leaves the data where it is. It is used to turn off any position adjustments for layers that defaults to non-identity position adjustments. It takes no arguments and has no requirements.
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
---
2+
title: Jitter
3+
---
4+
5+
> Positions are set within the [`DRAW` clause](../../clause/draw.qmd), using the `SETTING` subclause. Read the documentation for this clause for a thorough description of how to use it.
6+
7+
Jitter adjustment adds a random offset to the data point to avoid overplotting on discrete axes. It is mainly used in conjunction with point layers.
8+
9+
## Position scale requirements
10+
Jitter requires at least one axis to be discrete as it only jitters along discrete axes.
11+
12+
## Settings
13+
Apart from the settings of the layer type, setting `position => 'jitter'` will allow these additional settings:
14+
15+
* `width`: The total width the jittering will occupy as a proportion of the space available on the scale. Defaults to 0.9
16+
* `dodge`: Should dodging be applied before jittering. The dodging behavior follows the [dodge position](dodge.qmd) behavior? Default to `true`
17+
* `distribution`: Which kind of distribution should the jittering follow? One of:
18+
- `'uniform'` (default): Jittering is sampled from a uniform distribution between `-width/2` and `width/2`
19+
- `'normal'`: Jittering is sampled from a normal distribution with σ as `width/4` resulting in 95% of the points falling inside the given width
20+
- `'density'`: Jittering follows the density distribution within the group so that the jitter occupies the same area as an equivalent [violin plot](../type/violin.qmd) with density remapped to offset
21+
- `'intensity'`: Jittering follows the intensity distribution within the group so that the jitter occupies the same area as an equivalent [violin plot](../type/violin.qmd) with intensity remapped to offset
22+
23+
If `distribution` is either `'density'` or `'intensity'` then one of the axes must be continuous
24+
* `bandwidth`: A numerical value setting the smoothing bandwidth to use for the `'density'` and `'intensity'` distributions. If absent (default), the bandwidth will be computed using Silverman's rule of thumb.
25+
* `adjust`: A numerical value as multiplier for the `bandwidth` setting, with 1 as default.
26+
27+
## Examples
28+
When plotting points on a discrete axis they are all placed in the middle
29+
30+
```{ggsql}
31+
VISUALISE species AS x, bill_dep AS y, sex AS fill FROM ggsql:penguins
32+
DRAW point
33+
```
34+
35+
Use jittering to better see the individual points
36+
37+
```{ggsql}
38+
VISUALISE species AS x, bill_dep AS y, sex AS fill FROM ggsql:penguins
39+
DRAW point
40+
SETTING position => 'jitter'
41+
```
42+
43+
By default, dodging is applied to separate the groups. Turn this off if you want the jitter to occupy the same space regardless of grouping
44+
45+
```{ggsql}
46+
VISUALISE species AS x, bill_dep AS y, sex AS fill FROM ggsql:penguins
47+
DRAW point
48+
SETTING position => 'jitter', dodge => false
49+
```
50+
51+
Use a `'density'` distribution to also indicate the distribution shape with the jitter
52+
53+
```{ggsql}
54+
VISUALISE species AS x, bill_dep AS y FROM ggsql:penguins
55+
DRAW point
56+
SETTING position => 'jitter', distribution => 'density'
57+
```
58+
59+
When both axes are discrete the dodging follows a grid
60+
61+
```{ggsql}
62+
VISUALISE species AS x, sex AS y, body_mass AS fill FROM ggsql:penguins
63+
DRAW point
64+
SETTING position => 'jitter'
65+
SCALE BINNED fill
66+
SETTING breaks => 4, pretty => false
67+
```
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
---
2+
title: Stack
3+
---
4+
5+
> Positions are set within the [`DRAW` clause](../../clause/draw.qmd), using the `SETTING` subclause. Read the documentation for this clause for a thorough description of how to use it.
6+
7+
The stack position adjustment works by stacking objects on top of each other. It makes the most sense for layer types where their height is the primary encoding (i.e. they naturally extend from 0). Stack is the default position for bar and area plots
8+
9+
## Position scale requirements
10+
Stack requires a continuous scale with a range mapping (e.g. either `y` + `yend` or `ymin` + `ymax`) and requires all ranges to be positive with a baseline of zero. The axis that satisfies this will be used as the stacking direction
11+
12+
## Settings
13+
Apart from the settings of the layer type, setting `position => 'stack'` will allow these additional settings:
14+
15+
* `center`: Should the full stack be centered around 0. Can be used in conjunction with area layers to create steamgraphs. Default to `false`
16+
* `total`: Sets a total value to which each stack height is normalised. Setting this value leads to 'fill' behaviour. Defaults to `null` (no normalisation)
17+
18+
## Examples
19+
20+
Stack is the default for bar and area
21+
22+
```{ggsql}
23+
VISUALISE Day AS x, Wind AS y FROM ggsql:airquality
24+
DRAW area
25+
MAPPING Month AS fill
26+
FILTER Day <= 30
27+
SCALE ORDINAL fill
28+
```
29+
30+
Turn it off to see the effect (stacking is nonsensical for wind measurements)
31+
32+
```{ggsql}
33+
VISUALISE Day AS x, Wind AS y FROM ggsql:airquality
34+
DRAW area
35+
MAPPING Month AS fill
36+
SETTING position => 'identity'
37+
FILTER Day <= 30
38+
SCALE ORDINAL fill
39+
```
40+
41+
Set `center => true` to create a steamgraph
42+
43+
```{ggsql}
44+
VISUALISE Day AS x, Wind AS y FROM ggsql:airquality
45+
DRAW area
46+
MAPPING Month AS fill
47+
SETTING center => true
48+
FILTER Day <= 30
49+
SCALE ORDINAL fill
50+
```
51+
52+
Use `total` to see the percentage contribution from each group
53+
54+
```{ggsql}
55+
VISUALISE Day AS x, Wind AS y FROM ggsql:airquality
56+
DRAW area
57+
MAPPING Month AS fill
58+
SETTING total => 100
59+
FILTER Day <= 30
60+
SCALE ORDINAL fill
61+
```
Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: "Area"
33
---
44

5-
> Layers are declared with the [`DRAW` clause](../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it.
5+
> Layers are declared with the [`DRAW` clause](../../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it.
66
77
The area layer is used to display absolute amounts over a sorted x-axis. It can be seen as a [ribbon layer](ribbon.qmd) where the `ymin` is anchored at zero.
88

@@ -21,10 +21,7 @@ The following aesthetics are recognised by the area layer.
2121
* `linewidth`: The width of the contour lines.
2222

2323
## Settings
24-
* `stacking`: Determines how multiple groups are displayed. One of the following:
25-
* `'off'`: The groups `y`-values are displayed as-is (default).
26-
* `'on'`: The `y`-values are stacked per `x` position, accumulating over groups.
27-
* `'fill'`: Like `'on'` but displayed as a fraction of the total per `x` position.
24+
* `position`: Determines the position adjustment to use for the layer (default is `'stack'`)
2825

2926
## Data transformation
3027
The area layer does not transform its data but passes it through unchanged.
@@ -56,17 +53,23 @@ VISUALISE Date AS x, Value AS y FROM long_airquality
5653
DRAW area MAPPING Series AS colour
5754
```
5855

59-
We can stack the series by using `stacking => 'on'`. The line serves as a reference for 'unstacked' data.
56+
By default the areas are stacked on top of each other. If you'd rather see all with a 0 baseline set the position to identity
6057

6158
```{ggsql}
6259
VISUALISE Date AS x, Value AS y, Series AS colour FROM long_airquality
63-
DRAW area SETTING stacking => 'on', opacity => 0.5
64-
DRAW line
60+
DRAW area SETTING position => 'identity', opacity => 0.5
6561
```
6662

67-
When `stacking => 'fill'` we're plotting stacked proportions. These only make sense if every series is measured in the same absolute unit. (Wind and temperature have different units and the temperature is not absolute.)
63+
When `position => 'stack_fill'` we're plotting stacked proportions. These only make sense if every series is measured in the same absolute unit. (Wind and temperature have different units and the temperature is not absolute.)
6864

6965
```{ggsql}
7066
VISUALISE Date AS x, Value AS y, Series AS colour FROM long_airquality
71-
DRAW area SETTING stacking => 'fill'
72-
```
67+
DRAW area SETTING position => 'fill'
68+
```
69+
70+
An alternative is to center the stacks to create a steamgraph
71+
72+
```{ggsql}
73+
VISUALISE Date AS x, Value AS y, Series AS colour FROM long_airquality
74+
DRAW area SETTING position => 'stack', center => true
75+
```

0 commit comments

Comments
 (0)