-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathweek1-exercises.qmd
More file actions
145 lines (100 loc) · 2.73 KB
/
week1-exercises.qmd
File metadata and controls
145 lines (100 loc) · 2.73 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
---
title: "week1-exercises"
format: html
---
In the exercises, you'll work with a different database. It is a synthetic dataset of breast cancer patients, but it has the same table structure (or schema) as the in-class example. It is much larger than the in class example.
We'll first connect to the database:
```{r}
#| context: setup
library(duckdb)
library(DBI)
con <- DBI::dbConnect(duckdb::duckdb(),
"data/synthea-smaller_breast_cancer.db")
```
## `person`
Let's get some details about the tables.
`DESCRIBE` the person table.
```{sql}
#| connection: "con"
DESCRIBE -------;
```
Show the first 10 rows of `person`.
```{sql}
#| connection: "con"
SELECT ------
FROM ---------
LIMIT ----;
```
How many people (or rows) are in the person table?
```{sql}
#| connection: "con"
SELECT COUNT(----) FROM person;
```
How many people are born after 1980?
```{sql}
#| connection: "con"
#| eval: false
SELECT COUNT(person_id)
FROM person
WHERE year_of_birth -------;
```
How about how many people who have `gender_source_value` of 'M'? (Hint: remember to use *single* quotes for values, not *double* quotes):
```{sql}
#| connection: "con"
#| eval: false
SELECT COUNT(person_id)
FROM person
WHERE ---- = ----
```
Ok, we now have a better idea of what is in the `person` table. Let's take a deeper dive into the `concept` table.
## Look at the `concept` table
`DESCRIBE` the concept table.
```{sql}
#| connection: "con"
DESCRIBE -----;
```
Select the distinct `domain_id`s from the `concept` table:
```{sql}
#| connection: "con"
SELECT DISTINCT --------
FROM -------;
```
Return the number of distinct `concept_name`s with `domain_id` equal to `'Procedure'`:
```{sql}
#| connection: "con"
SELECT COUNT(concept_name)
FROM concept
WHERE -----------;
```
## Look at `procedure_occurrence`
Let's take a look at `procedure_occurrence` and see whether it uses `concept_id`s.
DESCRIBE `procedure_occurrence`:
```{sql}
#| connection: "con"
DESCRIBE procedure_occurrence;
```
How many columns are `concept_id`s?
How many distinct `procedure_concept_id`s are there in this `procedure_occurrence` table?
```{sql}
#| connection: "con"
SELECT COUNT(DISTINCT ----)
FROM ----;
```
## Explore another table
Describe another table in the Database. (You can use `SHOW TABLES` to show a list of tables).
```{sql}
#| connection: "con"
```
Return the first 10 rows of this table.
```{sql}
#| connection: "con"
```
Choose a column and try to write a `WHERE` statement to filter it. `COUNT` the number of rows returned.
```{sql}
#| connection: "con"
```
## Disconnect from the database
When you're done with your assignment, run the below code chunk to disconnect from the database.
```{r}
dbDisconnect(con)
```