You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This vignette recreates the style of a figure in [Garrison and Rodgers (2016)](https://www.sciencedirect.com/science/article/pii/S0160289616300162) using `ggplot2` and example data from NLSY79 on SES and flu vaccinations.
26
+
This vignette recreates the style of a figure in [Garrison and Rodgers (2016)](https://www.sciencedirect.com/science/article/pii/S0160289616300162) using `ggplot2` and synthetic data structured with `discord_data()`. The figure illustrates the patterns of relationships between sibling differences in socioeconomic status (SES) and sibling differences in flu vaccinations.
27
27
28
28
29
-
# Data Preparation
29
+
# Data Generation and Preparation
30
30
31
31
## Data Cleaning
32
32
33
-
This section reuses the data preparation pipeline developed in the regression vignette.
34
33
35
-
That vignette demonstrated how to set up data for discordant regression analysis by using discord data processing tools. Those tools facilitate the construction of kinship links, including identifying sibling pairs, merging sibling characteristics, and calculating pair-level variables.
36
-
37
-
Here, we reuse that same pipeline to prepare the data for plotting.
38
-
Specifically, we apply the same kinship pairing, data merging, and cleaning procedures, but our focus is now on visualizing patterns rather than fitting regression models.
39
34
40
35
The underlying dataset is the NLSY79, which includes measures of flu vaccination and socioeconomic status (SES) for kinship pairs.
41
36
As in the regression vignette, we restrict the sample to individuals who are housemates and have a relatedness of 0.5.
@@ -57,69 +52,61 @@ library(gridExtra)
57
52
library(ggExtra)
58
53
library(janitor)
59
54
60
-
# Load the data
61
-
data(data_flu_ses)
62
-
63
-
# Get kinship links for individuals with the following variables:
Because we are interested in differences between kin, we create a new variable, `ses_diff_group`, that classifies SES differences into three categories: "More Advantaged", "Equally Advantaged", and "Less Advantaged". This variable is later used to group observations in the marginal density plots. They serve to help visualize how the distributions of mean SES and mean flu vaccinations differ across these SES difference categories.
@@ -190,11 +177,11 @@ The first step is to create the base plot with sibling 1 data. In the next code
190
177
```{r individual, echo=TRUE, message=FALSE}
191
178
# Individual level plot
192
179
plot_indiv <- plot_indiv_sib1 <- ggplot(
193
-
df_flu_modeling,
180
+
df_mz_signif,
194
181
aes(
195
-
x = s00_h40_s1,
196
-
y = flu_total_s1,
197
-
color = s00_h40_s1 - s00_h40_s2
182
+
x = SES_1,
183
+
y = flu_1,
184
+
color = SES_1 - SES_2
198
185
)
199
186
) +
200
187
geom_point(
@@ -232,9 +219,9 @@ plot_indiv <- plot_indiv +
232
219
size = 0.8, alpha = 0.8, na.rm = TRUE,
233
220
position = position_jitter(width = 0.2, height = 0.2),
234
221
aes(
235
-
x = s00_h40_s2,
236
-
y = flu_total_s2,
237
-
color = s00_h40_s2 - s00_h40_s1 # this reverses the color difference so sibling 2 points use the opposite color gradient compared to sibling 1, making it visually clear which sibling is being represented and how their SES difference is encoded
222
+
x = SES_2,
223
+
y = flu_2,
224
+
color = SES_2 - SES_1 # this reverses the color difference so sibling 2 points use the opposite color gradient compared to sibling 1, making it visually clear which sibling is being represented and how their SES difference is encoded
238
225
)
239
226
) +
240
227
scale_colour_gradientn(
@@ -260,11 +247,11 @@ The individual-level plot shows a positive association between SES and flu vacci
260
247
261
248
```{r}
262
249
plot_indiv_s00 <- ggplot(
263
-
df_flu_modeling,
250
+
df_mz_signif,
264
251
aes(
265
-
x = s00_h40_s1,
266
-
y = s00_h40_s2,
267
-
color = s00_h40_s1 - s00_h40_s2
252
+
x = SES_1,
253
+
y = SES_2,
254
+
color = SES_1 - SES_2
268
255
)
269
256
) +
270
257
geom_point(
@@ -293,11 +280,11 @@ plot_indiv_s00 +
293
280
294
281
```{r}
295
282
plot_indiv_flu <- ggplot(
296
-
df_flu_modeling,
283
+
df_mz_signif,
297
284
aes(
298
-
x = flu_total_s1,
299
-
y = flu_total_s2,
300
-
color = s00_h40_s1 - s00_h40_s2
285
+
x = flu_1,
286
+
y = flu_2,
287
+
color = SES_1 - SES_2
301
288
)
302
289
) +
303
290
geom_point(
@@ -334,8 +321,8 @@ This section creates a between-family plot that visualizes mean SES at age 40 ag
0 commit comments