-
Notifications
You must be signed in to change notification settings - Fork 17
Finish Get started section #274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 4 commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,101 @@ | ||
| --- | ||
| title: The anatomy of ggsql | ||
| --- | ||
|
|
||
| With a slight bit of knowledge about the grammar of graphics, let's dive into how the concepts are present in ggsql, starting with some key concepts and moving on to how it is reflected in the syntax. | ||
|
|
||
| ## Layers | ||
| ggsql is composable, allowing you to create arbitrarily complex visualizations. Central to this is the concept of layers. A layer is a single visual encoding of some underlying data, e.g. [points](../syntax/layer/type/point.qmd) for a scatterplot, or [bars](../syntax/layer/type/bar.qmd) for a barplot. You can have multiple layers in which case they are stacked on top of each other in the order they are declared (i.e. a layer declared last will be on top and overlap any layer declared before it). A scatterplot with a regression line consist of two layers: A [point](../syntax/layer/type/point.qmd) layer and a [smooth](../syntax/layer/type/smooth.qmd) layer. | ||
|
|
||
| Layers may show data directly, e.g. a [point](../syntax/layer/type/point.qmd) layer will show each observation as a point, or it may apply a statistical transformation and show the result of that, e.g. a [histogram](../syntax/layer/type/histogram.qmd) layer will bin and count your data before showing the result as bars. | ||
|
|
||
| ## Aesthetics | ||
| You will encounter aesthetics throughout the documentation and it is arguably one of the most important concepts to get right. Aesthetics are the things that describe the visual entities that makes up a layer, e.g. the [color](../syntax/scale/aesthetic/1_color.qmd) of a point, the [linewidth](../syntax/scale/aesthetic/linewidth.qmd) of a line, and the [opacity](../syntax/scale/aesthetic/2_opacity.qmd) of a polygon. | ||
|
|
||
| There are two types of aesthetics: position aesthetics and material aesthetics. The former are related to *where* an entity is *placed* and is deeply connected to the coordinate system of the plot. The latter are related to *how* the entity *looks*. | ||
|
|
||
| Aesthetics can either be *mapped* or *set*. You use mapping if you want the aesthetic to be related to values in your data, e.g. have fill color be controlled by a category column from your dataset. You use setting when you wish to fix an aesthetic to a specific value, not related to your data, e.g. you want to set linewidth to 2pt. | ||
|
|
||
| ## Scales | ||
| When you map data to an aesthetic it will seldom have values that are meaningful for the aesthetic. Consider mapping `region` to `fill` because you wish the fill color shows the geographical region the data pertains to. `region` might contain values such as `Asia`, `Europe`, and `South America` which are not meaningful color values. How do you translate these values into something the aesthetic understands? | ||
|
|
||
| The answer is using a scale. When mapping an aesthetic it will automatically be scaled by a default scale to ensure that the aesthetic receives values it understands, but you can take control of the scaling and e.g. use a different color palette. | ||
|
|
||
| ## The syntax | ||
| Before we move on, let's examine how the concepts we have just described are reflected in the ggsql syntax. Often these will be enough for your basic visualization needs. | ||
|
|
||
| ### `VISUALISE` | ||
| Every ggsql query starts with a [`VISUALISE`](../syntax/clause/visualise.qmd) (or `VISUALIZE`) clause. It denotes that we are exiting regular SQL syntax and entering ggsql. | ||
|
|
||
| While `VISUALISE` can stand on its own as a demarcation line between the regular and the visual query, you can also pass it a list of aesthetic mappings which will define the default mapping for the layers so that you don't have to repeat it for every layer. Lastly, if you do not have a initial SQL query you can name a data source for your plot. | ||
|
|
||
| Bringing all of these things together, a `VISUALISE` clause could look like this: | ||
|
|
||
| ```ggsql | ||
| -- |---------- mapping ----------|--- data source ---| | ||
| VISUALISE body_mass AS x, bill_len AS y FROM ggsql:penguins | ||
| ``` | ||
|
|
||
| ### `DRAW` | ||
| Following `VISUALISE` you'd usually provide one or more [`DRAW`](../syntax/clause/draw.qmd) clauses which will define your layer. The `DRAW` clause is arguably the most complex clause, but the basic usage is straightforward: You provide the type of the layer, any additional mapping if needed, and perhaps modify the settings of the layer. To achieve this we employ the `MAPPING` and `SETTING` clauses. | ||
|
|
||
| The input to the `MAPPING` clause looks exactly like what we saw above for the `VISUALISE` clause. You can provide mappings and optionally a data source if you want the layer to use a data source different from the global data. The `SETTING` clause allows you to both *set* aesthetics as well as set parameters specific to the layer (e.g. number of bins in a histogram). | ||
|
|
||
| Bringing all of this together a `DRAW` clause could look like this: | ||
|
|
||
| ```ggsql | ||
| -- |- type --| | ||
| DRAW histogram | ||
| -- |-- mapping --| | ||
| MAPPING bill_len AS x | ||
| -- |-- setting ---|- parameter -| | ||
| SETTING stroke => null, bins => 20 | ||
| ``` | ||
|
|
||
| but, if mappings and data source have already been taken care off, it can be as simple as | ||
|
thomasp85 marked this conversation as resolved.
Outdated
|
||
|
|
||
| ```ggsql | ||
| DRAW point | ||
| ``` | ||
|
|
||
| ### `SCALE` | ||
| As [described above](#scales) ggsql automatically creates a default for mapped aesthetics and if those suit your need there is no reason to modify them. However, if change is needed you do it with the [`SCALE`](../syntax/clause/scale.qmd) clause. | ||
|
thomasp85 marked this conversation as resolved.
Outdated
|
||
|
|
||
| The clause both allows you to set the type of scale, the input range, the output range, and transform, and let you control breaks and label formatting. So, the clause can end up with a lot of information but the syntax has been designed so it reads very natural. Further, every part is optional and can be left out if the default fits. An example of a rather complex `SCALE` clause could be: | ||
|
thomasp85 marked this conversation as resolved.
Outdated
|
||
|
|
||
| ```ggsql | ||
| SCALE ORDINAL fill FROM ['Low', 'Mid', 'High'] TO viridis | ||
| SETTING breaks => 6 | ||
| ``` | ||
|
|
||
| But, if you are only interested in changing e.g. the palette it can be as simple as: | ||
|
|
||
| ```ggsql | ||
| SCALE fill TO viridis | ||
| ``` | ||
|
|
||
| ## Example | ||
| Using the things we have just learned we can combine it all to a complete query consisting of multiple layers and custom scales: | ||
|
|
||
| ```{ggsql} | ||
| VISUALISE bill_len AS x, bill_dep AS y, species AS stroke FROM ggsql:penguins | ||
| DRAW point | ||
| MAPPING body_mass AS size | ||
| SETTING fill => null | ||
| DRAW smooth | ||
| SETTING method => 'ols' | ||
| SCALE stroke TO dark2 | ||
| SCALE BINNED size TO [4, 15] | ||
| SETTING breaks => 4 | ||
| ``` | ||
|
|
||
| In the above we create a global mapping of bill_len to the `x` aesthetic and bill_dep to the `y` aesthetic using the built-in penguins dataset. We use `DRAW` to create two layers: A point layer for a scatter plot and a smooth layer for regression lines. For the point layer we _map_ the body_mass to size to create a bubble chart and _set_ the fill aesthetic to be empty (`null`) so only the outline is shown. For the smooth layer we set the layer parameter `method` to `'ols'` to estimate a straight regression line. Lastly, we modify the stroke scale to use the dark2 palette from the ColorBrewer project and apply a binned scale to size that goes from 4pt to 15pt with 4 breaks (resulting in 3 bins). | ||
|
thomasp85 marked this conversation as resolved.
Outdated
|
||
|
|
||
| While the query above may feel like a mouthful, remember that most visualizations are much simpler: | ||
|
|
||
| ```{ggsql} | ||
| VISUALISE body_mass AS x FROM ggsql:penguins | ||
| DRAW histogram | ||
| ``` | ||
|
|
||
| In the next section we will introduce the remaining parts of the grammar and the related syntax, but the parts covered here will already take you a very long way. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,123 @@ | ||
| --- | ||
| title: The rest of the owl | ||
| --- | ||
|
|
||
| We have covered the three most important concepts of the ggsql syntax: `VISUALISE`, `DRAW`, and `SCALE`. Now it's time to learn how to draw the rest of the owl. | ||
|
|
||
| {style="max-width:500px; display:block; margin:auto;"} | ||
|
|
||
| Thankfully, we will give you a bit more help than the illustration above in understanding the last bits of ggsql. | ||
|
|
||
| ## Coordinate systems | ||
| In the earlier section we talked about position aesthetics being special because they are being orchestrated by the coordinate system. The coordinate system is the entity that takes care of the spatial arrangement of graphic objects based on their position aesthetic mapping. When thinking about a coordinate system we tend to think about a Cartesian coordinate system which has a horizontal x-axis and a vertical y-axis. There are others though, like polar systems, cartographic maps, and ternary systems. | ||
|
|
||
| At the most basics a coordinate system is a projection function that takes the position aesthetic and projects them into a 2 dimensional plane on the screen or paper. While we commonly have 2 position aesthetics that gets projected to a 2 dimensional plane, this is not a necessity. 3 positional aesthetics could be projected to 2 dimensions using a perspective transform or by using a special coordinate system such as a ternary layout. | ||
|
|
||
| ## Faceting | ||
| Faceting is the process of dividing your data by one or more variables and visualizing each group as a small version next to the other group. This technique is also known as creating small multiples. Often, each single plot will share the same position scales so that it is very easy to compare the small representations against each other. | ||
|
|
||
| Using faceting is a very powerful way of comparing groups against each other as the sense of distribution within the group is not impaired by the presence of other data in the view. | ||
|
|
||
| ## Labelling and annotation | ||
| While we all want our data to speak for itself, it is impossible to understand a visualization without context. If the visualization is embedded in some text then the context is often given there, but you are never in control of how your visualization is being shared. Because of this you should strive for your plots to be self-explanatory, both in what it represents and what main points it provides. For the former, you will often use title, subtitle, and proper naming of the axes and legends. For the latter you may want to add elements to the plot area that highlights certain aspects of what is shown. | ||
|
|
||
| ## Syntax | ||
| With the remaining part of the grammar under our belt let's examine how it is reflected in the syntax. | ||
|
|
||
| ### `PROJECT` | ||
| We use the [`PROJECT`](../syntax/clause/project.qmd) clause to control the coordinate system of the plot. It both allows you to control the naming of the position aesthetics in the coordinate system, as well as set various parameters that control the behavior of the coordinate system. | ||
|
|
||
| The above alludes to the fact that coordinate systems have different position aesthetics. Often you expect `x` and `y` as position aesthetics and while these are indeed the default name for the [`cartesian`](../syntax/coord/cartesian.qmd) coordinate system they would be nonsensical for a [`polar`](../syntax/coord/polar.qmd) system which uses `radius` and `angle` as defaults. You can, however, freely define your own names, e.g. `r` and `a` for a polar system if you value brevity over comprehension. | ||
|
|
||
| `PROJECT` also takes a `SETTING` clause which works much like the `SETTING` clause in `DRAW` and `SCALE`, allowing you to modify the behavior of the coordinate system. An example of a full `PROJECT` query could be: | ||
|
thomasp85 marked this conversation as resolved.
Outdated
|
||
|
|
||
| ```ggsql | ||
| PROJECT r, a TO polar | ||
| SETTING start => -90, end => 90 | ||
| ``` | ||
|
|
||
| However, you may not need to specify anything at all. ggsql will automatically detect the use of Cartesian or polar coordinate system from your mapping. If you map to the x or y aesthetics you implicitly use a Cartesian coordinate system, and if you map to radius or angle you implicitly use a polar coordinate system. | ||
|
|
||
| ### `FACET` | ||
| Faceting is applied with the [`FACET`](../syntax/clause/facet.qmd) clause. It allows you to either facet by a single variable (`FACET var`) or by a combination of two variables `FACET var1 BY var2`. In the former case the small multiples are laid out in a row-wise manner, wrapping to the next row if there are more multiples than the number of column. In the latter case the first variable is related to the rows and the second is related to the columns. | ||
|
|
||
| There is an alternative to using the `FACET` variable, which is to map the variables directly to the facet aesthetics. There are three of these: `panel` is used when faceting by a single variable and `row` and `column` is used when faceting by two variables. `FACET var` is thus equivalent to `VISUALISE var AS panel`. Whichever you choose to use is thus a matter of personal preference as well as whether you also need to modify faceting behavior (in which case ) | ||
|
thomasp85 marked this conversation as resolved.
Outdated
|
||
|
|
||
| ### `LABEL` | ||
| ggsql automatically labels the axes and legends in your plot by the column name of the data mapped to it. However, you often want to provide more descriptive names as well as a title to give context to the plot. All of this is accomplished with the [`LABEL`](../syntax/clause/label.qmd) clause by setting the label text for both titles, subtitles, etc. as well as any | ||
| aesthetic you have mapped. A `LABEL` clause may end up looking like this: | ||
|
|
||
| ```ggsql | ||
| LABEL | ||
| title => "Average wingspan of a cartoon owl" | ||
| x => "Radius of first circle (cm)" | ||
| y => "Wingspan (cm)" | ||
| ``` | ||
|
|
||
| ### `PLACE` | ||
| When we want to add graphical objects to the plot that do not directly relate to data in your dataset we can use [`PLACE`](../syntax/clause/place.qmd). The clause works much like the `DRAW` clause except it doesn't take mappings or a data source. Instead you provide the data to place as literal values in the `SETTING` part of the clause. While you can place any type of layer, some are more useful than others and you will probably find yourself placing more text, segments, and rectangles than boxplots and histograms. | ||
|
|
||
| A standard `PLACE` query could look like this: | ||
|
|
||
| ```ggsql | ||
| PLACE text | ||
| SETTING x => 30, y => 45, label => "Very long wings, right!" | ||
| ``` | ||
|
|
||
| You may wonder why you wouldn't just do this using `DRAW` since that would also be legal query. The reason is the `DRAW` clauses expand their literals to be the same length as their data source. So if the plot is visualizing a table of 100 rows you will end up with 100 labels stacked on top of each other. | ||
|
|
||
| ## Examples | ||
| Let's apply what we have learned to a couple of plots. First, we will create a pie chart by projecting a stacked bar chart to a polar coordinate system: | ||
|
|
||
| ```{ggsql} | ||
| VISUALISE species AS fill FROM ggsql:penguins | ||
| DRAW bar | ||
| PROJECT TO polar | ||
| ``` | ||
|
|
||
| It may be easier to see how the bar chart turns into a pie by looking at it unstacked: | ||
|
|
||
| ```{ggsql} | ||
| VISUALISE species AS radius, species AS fill FROM ggsql:penguins | ||
| DRAW bar | ||
| ``` | ||
|
|
||
| See how we didn't have to specify the polar coordinate system in the last example because we have a mapping to radius, allowing ggsql to deduce the coordinate system automatically. | ||
|
|
||
| If we instead map the species to angle we end up with a rose plot | ||
|
|
||
| ```{ggsql} | ||
| VISUALISE species AS angle, species AS fill FROM ggsql:penguins | ||
| DRAW bar | ||
| ``` | ||
|
|
||
| Moving back to the regular pie chart, we might be interested in comparing how the species distribution varies by sex. We can do this with faceting: | ||
|
|
||
| ```{ggsql} | ||
| VISUALISE species AS fill FROM ggsql:penguins | ||
| DRAW bar | ||
| PROJECT TO polar | ||
| FACET island | ||
| SETTING free => 'angle' | ||
| SCALE panel FROM ['Biscoe', 'Dream'] | ||
| ``` | ||
|
|
||
| Above, we use the `free` parameter of facet to allow each facet to have their own angle scale. Further, we use `SCALE` on the panel aesthetic to only show panels for the Biscoe and Dream islands. | ||
|
|
||
| We can use `LABEL` to add a bit more context to our final plot: | ||
|
|
||
| ```{ggsql} | ||
| VISUALISE species AS fill FROM ggsql:penguins | ||
| DRAW bar | ||
| PROJECT TO polar | ||
| FACET island | ||
| SETTING free => 'angle' | ||
| SCALE panel FROM ['Biscoe', 'Dream'] | ||
| LABEL | ||
| title => 'Distribution of penguin species between islands', | ||
| subtitle => 'Compared across 344 penguins', | ||
| fill => 'Species' | ||
| ``` | ||
|
|
||
| ## The rest of the rest of the owl | ||
| While we have now taken a quick tour through the main features of ggsql along with the theoretical backbone that underpins it there is still a lot to learn. The next step is to browse the [syntax documentation](../syntax/index.qmd), begin to build some visualizations on your own, and get some experience with ggsql. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| --- | ||
| title: Tooling | ||
| --- | ||
|
|
||
| Now that we understand some of the most important parts of the syntax let's spend a bit of time on where and how to apply it. All the examples on this page are interactive and runs directly in the browser, which is obviously useful for teaching, but it will not suffice for your day-to-day work where you need to interact with your own data. ggsql is a general tool you can use in a multitude of ways and we'll go over the most important below. | ||
|
|
||
| ## VS Code extension | ||
| We provide an extension for VS Code/Positron that brings language support to the IDE. Positron is generally superior for data analysis and the ggsql integration is deeper there, which we will showcase below. Still, using the extension with VS Code should provide you with a good developer experience. You can grab the [ggsql extension](https://open-vsx.org/extension/ggsql/ggsql) directly from the marketplace. | ||
|
|
||
| Once installed you will get access to ggsql as a language at the same level as R and Python. You can open and edit `.gsql` files with syntax highlighting, autocomplete, you can open up a REPL in the console pane and executing queries and you can see the resulting visualization appear in the plot pane. If you have any database connections in the connection pane you can directly attach these to your ggsql runtime and begin to visualize the tables in there. | ||
|
|
||
| ## Jupyter kernel | ||
| Once the Jupyter kernel is installed you can use ggsql as an engine in your Jupyter notebooks and Quarto documents. For a Jupyter notebook you can select the kernel when you start a new notebook. For a Quarto document you use the ggsql language name to tell the renderer to use the ggsql kernel e.g. | ||
|
|
||
| ```{{ggsql}} | ||
| VISUALISE ... | ||
| ``` | ||
|
|
||
| Each block in the document are using the same session, so tables created in one block will be available in subsequent blocks. | ||
|
thomasp85 marked this conversation as resolved.
Outdated
|
||
|
|
||
| ## Python package | ||
| We have a [python package](https://pypi.org/project/ggsql/) which you can install through pip (`pip install ggsql`). The package provides binding to ggsql and allows you to plot with ggsql directly from within python and register alternative data backends. | ||
|
|
||
| A simple example could be | ||
|
|
||
| ```python | ||
| import ggsql | ||
| import polars as pl | ||
|
|
||
| # Create a DataFrame | ||
| df = pl.DataFrame({ | ||
| "x": [1, 2, 3, 4, 5], | ||
| "y": [10, 20, 15, 30, 25], | ||
| "category": ["A", "B", "A", "B", "A"] | ||
| }) | ||
|
|
||
| # Render to Altair chart | ||
| chart = ggsql.render_altair(df, "VISUALISE x, y DRAW point") | ||
|
|
||
| # Display or save | ||
| chart.display() # In Jupyter | ||
| chart.save("chart.html") # Save to file | ||
| ``` | ||
|
|
||
| ## Command line interface | ||
| While maybe not the most ergonomic way to interact directly with ggsql, there is a CLI interface if you need to build tools around ggsql. The CLI tool allows you to execute a file or string and validate a query without executing it. A simple example of executing a query looks like this: | ||
|
|
||
| ```bash | ||
| ggsql --exec "VISUALISE species AS fill FROM ggsql:penguins DRAW bar" | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not strictly part of this PR, but in the grammar.qmd page can we add something like a diagram or example, just to break up the text.
I found these types of diagram for ggplot2 helpful, if we can easily make something similar for ggsql:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I certainly want to spruce up that section with a diagram or two