nash-0/51-simple-ts-monthly.Rmd at main · VandyDataScience/nash-0 · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
title: "51-simple-ts-monthly"
output: html_notebook
---

```{r simple ts monthly imports}
source(knitr::purl("40-modeling.Rmd", quiet=TRUE))
fs::file_delete("40-modeling.R")
source(knitr::purl("45-prepare-for-shiny.Rmd", quiet=TRUE))
fs::file_delete("45-prepare-for-shiny.R")
```

# Experiment of monthly time-series

EDIT: The original idea was for this notebook to be the one we run to create the data used in the Shiny app. While this is still mostly true, there are a few important notes. Originally, the tonnage predictions were determined by training on permit-level data from San Francisco. As discussed in `44-2`, however, these permit-level models do not predict any long term increase in the total derbis generated over time (in contrast to the obvious trend when examining historical data). Thus, the tonnage prediction is now made by fitting an exponential function to historical tonnage data in Nashville. This fit is done in `44-2`. We still use the permit-level data in conjunction with these overall predictions to obtain estimates for the debris generated by different subtypes of permits (e.g., commercial construction; see `app.R`). Thus, the data generated in this file is still used.

## Notes

In this notebook, I give an example of how to run the pipeline. This is currently what is used in `nash-zero-shiny/app.R`.

The first steps would be to specify the forecast and model we wish to use (via filenames). Here I'm going to actually create the forecast. (EDIT: I'm commenting out the forecast creation so we're not overwriting the forecast file.)

```{r write forecast}
forecast_fname <- expand_boxpath("forecasts/best_forecast.feather")
forecast_cols <- c("comm_v_res", "project_type")
# forecast_npermits(forecast_cols, nyr = 6, city = "nashville") %>%
#   write_feather(forecast_fname)
```

For the model, I'm going to use a particular neural network (created in `44-1`) and re-save it as the best model. (EDIT: Again, I'm commenting this out.)

```{r model path}
# orig_fname <- expand_boxpath("models/nn_model.rds")
model_fname <- expand_boxpath("models/best_model.feather")

# load_tonnage_model(orig_fname) %>%
#   saveRDS(model_fname)
```

Here we forecast tonnages for 100 sets of permits over the next six years. A few things to note here:

* This is quite slow. The main source of it being slow is the `expandRows` in `forecast_to_permits`. I recently made a change so that the expansion is done for each month, year, sample, and subtype. It turns out this slows things down quite a bit. Still, it's not much of a problem if we just save the output.

* This change also resulted in the warning below about rows dropped from the input. The result of the `expandRows` is the correct size. Nothing actually seems to be dropped. I believe what is happening is that, sometimes, the number of permits of a particular subtype forecasted for a particular month is zero. The `expandRows` function will replicate that row once for each permit. But if the number of permits is zero, then we just get a warning.

```{r get results of forecast}
results <- forecast_tonnage(nsets = 100, forecast_fname = forecast_fname, model_fname = model_fname, forecast_cols = forecast_cols)
```


```{r plot results with errorbars}
results %>%
  filter(fy>2021 &fy<2027) %>%
  plot_predictions()
```

(EDIT: Commenting out, so we don't overwrite results.)

```{r write results}
# results %>%
#   write_feather(expand_boxpath("forecasts/best_syn_permits.feather"))
```


# Prepare for Shiny

Here, we reduce the data set to something small that can be uploaded with the shiny app. The code below should really only be run once (or again if we change anything about the procedure). Otherwise, we'll constantly be overwriting files that we're committing to the repo. Thus, I've commented out the code. The functions run here appear in `45`.

```{r prepare files for shiny}
# prepare_for_shiny(expand_boxpath("forecasts/best_syn_permits.feather"), forecast_fname)
```