Skip to content

Commit 83d0233

Browse files
author
thomas mcandrew
committed
Merge branch 'main' of github.com:computationalUncertaintyLab/TSID
2 parents 4098740 + 00d0f3a commit 83d0233

9 files changed

Lines changed: 642 additions & 10 deletions

File tree

.DS_Store

0 Bytes
Binary file not shown.

.gitattributes

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
2+
*.ipynb diff=jupyternotebook
3+
4+
*.ipynb merge=jupyternotebook
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# TSID
Lines changed: 268 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,268 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "21a6615c-b9eb-4081-a362-e0bbdf82590a",
6+
"metadata": {},
7+
"source": [
8+
"# Chapter 1"
9+
]
10+
},
11+
{
12+
"cell_type": "markdown",
13+
"id": "4812cc2a-428d-4eee-aa2a-406a4b4d871e",
14+
"metadata": {},
15+
"source": [
16+
"## Time series data versus IID data \n",
17+
"\n",
18+
"A typical setup for statistical analysis assumes that a series of experiments generate observations that are independent and identically distributed~(often abbreviated i.i.d). \n",
19+
"For example, \n",
20+
"\n",
21+
"\\begin{align}\n",
22+
" \\mathcal{D} &= ( y_{1}, y_{2}, \\cdots, y_{n} ) \\\\ \n",
23+
" y_{i} &\\sim \\text{Poisson}(\\lambda)\n",
24+
"\\end{align}\n",
25+
"\n",
26+
"where we use $\\mathcal{D}$ to represent a dataset, lower case letters to represent collected observations, capital letters to represent random variables, and greek letters to represent parameters. \n",
27+
"Because we assume that the above observations were generated from a sequence of i.i.d poisson random variables, we can simplify expressions that incldue the probability of $Y_{1}, Y_{2}, \\cdots$. \n",
28+
"\n",
29+
"\\begin{align}\n",
30+
" P(Y_{1}, Y_{2}) &= P(Y_{1}) \\cdot P(Y_{2}) \\\\ \n",
31+
" P(Y_{1}, Y_{2}, \\cdots, Y_{n}) &= P(Y_{1}) \\cdot P(Y_{2}) \\cdots P(Y_{n}) = \\prod_{i=1}^{n} P(Y_{i}) \\\\ \n",
32+
" & = \\lambda^{\\sum_{i=1}^{n} y_{i} } \\frac{e^{ -n\\lambda }}{ \\prod_{i=1}^{n} y_{i}! } \\propto e^{ -n\\lambda }\\lambda^{\\sum_{i=1}^{n} y_{i} }\n",
33+
"\\end{align}\n",
34+
"\n",
35+
"The expression above is an (often good) approximation of the joint probability of observing all $n$ data points at once. \n",
36+
"Unlike more traditional data collections mechanisms, for time series data we cannot assume that the observations are i.i.d.\n",
37+
"Instead, we assume that observations at time $t$ deoend on all previous random variables before time $t$. \n",
38+
"Then, we cannot simplify the joint probability of the first $t$ random variables as their product. \n",
39+
"\n",
40+
"Recall the multiplication rule \n",
41+
"\n",
42+
"\\begin{align}\n",
43+
" P(A,B,C) &= P( B,C | A ) P(A) \\\\ \n",
44+
" &= P( C | B, A ) P(B|A) P(A)\n",
45+
"\\end{align}\n",
46+
"\n",
47+
"We can still use the multiplication rule to assess the joint probability of a sequence of random variables.\n",
48+
"Lets assume that we wish to model some time seres process from time unit one up until time unit $T$. \n",
49+
"Then we need to estimate probabilities like \n",
50+
"\n",
51+
"\\begin{align}\n",
52+
" P( Y_{1}, Y_{2}, \\cdots, Y_{T} ) = P(Y_{0})\\cdot P(Y_{1} | Y_{0}) \\cdot P(Y_{2} | Y_{1},Y_{0}) \\cdots P(Y_{T} | Y_{T-1} \\cdots Y_{0})\n",
53+
"\\end{align}\n",
54+
"\n",
55+
"The i.i.d assumption simplifies the above by assuming that each random variable is independent of all others. \n",
56+
"For time series, we want to simplify the above but still keep the most important characteristics of the process---that observations in the future depend on the past. \n",
57+
"\n",
58+
"### Markov Assumption \n",
59+
"\n",
60+
"Given a series of random variables, the Markov assumption states that the probability of $Y_{t}$ depends only on the random variable at time $t-1$, or \n",
61+
"\n",
62+
"\\begin{align}\n",
63+
" P(Y_{t} | Y_{t-1}, Y_{t-2}, \\cdots Y_{1}) \\approx P(Y_{t} | Y_{t-1})\n",
64+
"\\end{align}\n",
65+
"\n",
66+
"The markov assumption aims to capture the most basic attribute of a time series, that future values depend on the recent past, without the more restrictive property that future values depend on **all** of the past. \n",
67+
"\n",
68+
"This simplified considerably the above \n",
69+
"\n",
70+
"\\begin{align}\n",
71+
" P( Y_{1}, Y_{2}, \\cdots, Y_{T} ) = P(Y_{0})\\cdot P(Y_{1} | Y_{0}) \\cdot P(Y_{2} | Y_{1},Y_{0}) \\cdots P(Y_{T} | Y_{T-1} \\cdots Y_{0}) \\\\ \n",
72+
" & \\approx P(Y_{0}) \\cdot P(Y_{1} | Y_{0}) \\cdot P(Y_{2} | Y_{1}) \\cdots \n",
73+
"\\end{align}\n",
74+
"\n"
75+
]
76+
},
77+
{
78+
"cell_type": "markdown",
79+
"id": "f2aca56d-50c3-4bd0-a087-81cca8ad4d6b",
80+
"metadata": {},
81+
"source": [
82+
"## Influenza-like illness\n",
83+
"\n",
84+
"The Centers for Disease Control and Prevention collect a dataset about influenza-like illness,or ILI.\n",
85+
"ILI is a non-specific syndrome defined as fever and cough and/or sore throat. It is used for flu surveillance worldwide. ILI can be caused by influenza virus infection and infections with other respiratory viruses.\n",
86+
"\n"
87+
]
88+
},
89+
{
90+
"cell_type": "code",
91+
"execution_count": null,
92+
"id": "3f585c25-8d3e-43ee-a3b7-05f642670fcb",
93+
"metadata": {},
94+
"outputs": [],
95+
"source": [
96+
"#--d \n",
97+
"import pandas as pd \n",
98+
"\n",
99+
"d = pd.read_csv(\"./data/XXXXXXXX\") #<--using pandas to import a datset\n",
100+
"\n",
101+
"# plot time series for two state\n",
102+
"#x is weeks\n",
103+
"#y is percent ili (column_name = wILI)\n"
104+
]
105+
},
106+
{
107+
"cell_type": "code",
108+
"execution_count": null,
109+
"id": "0e4ed68f-6af4-4068-9612-52427e027ecd",
110+
"metadata": {},
111+
"outputs": [],
112+
"source": [
113+
"#--d \n",
114+
"import pandas as pd \n"
115+
]
116+
},
117+
{
118+
"cell_type": "markdown",
119+
"id": "7614c9a3-efc5-4037-b68d-9a9d97bef067",
120+
"metadata": {},
121+
"source": [
122+
"## COVID Community mobility\n",
123+
"\n",
124+
"COVID Community Mobility Reports aim to provide insights into what changed in response to policies aimed at combating COVID-19. The reports charted movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential.\n",
125+
"\n"
126+
]
127+
},
128+
{
129+
"cell_type": "code",
130+
"execution_count": null,
131+
"id": "3959d067-a105-4f97-b917-faa514116f36",
132+
"metadata": {},
133+
"outputs": [],
134+
"source": [
135+
"#--d \n",
136+
"import pandas as pd \n",
137+
"\n",
138+
"d = pd.read_csv(\"./data/XXXXXXXX\")\n",
139+
"\n",
140+
"# a plot of one county time seires for two activities\n",
141+
"\n",
142+
"# x is the day \n",
143+
"# y - parks_percent_change_from_baseline (<-for example)\n"
144+
]
145+
},
146+
{
147+
"cell_type": "markdown",
148+
"id": "0ff18230-d314-4bd8-8033-99c15cd2636d",
149+
"metadata": {},
150+
"source": [
151+
"## Mpox incidence"
152+
]
153+
},
154+
{
155+
"cell_type": "markdown",
156+
"id": "57dfd0e6-9805-4754-b774-738058c1fd2f",
157+
"metadata": {},
158+
"source": [
159+
"## Correlation, Covariance, and the Corrolelogram"
160+
]
161+
},
162+
{
163+
"cell_type": "code",
164+
"execution_count": null,
165+
"id": "29b752f3-aacc-4f33-9e65-5434927cbfaf",
166+
"metadata": {},
167+
"outputs": [],
168+
"source": [
169+
"# FFor ili we will wantt to plot the percent ILI at week t versus the percent ILI at week t+1"
170+
]
171+
},
172+
{
173+
"cell_type": "code",
174+
"execution_count": null,
175+
"id": "40c49a8b-547e-4d27-a64e-b92d6d187980",
176+
"metadata": {},
177+
"outputs": [],
178+
"source": [
179+
"# FFor COVID we will wantt to plot the behaviro at week t versus the behavior at week t+1"
180+
]
181+
},
182+
{
183+
"cell_type": "code",
184+
"execution_count": null,
185+
"id": "9dde1a78-0206-4272-a102-0c2295b6efd3",
186+
"metadata": {},
187+
"outputs": [],
188+
"source": []
189+
},
190+
{
191+
"cell_type": "markdown",
192+
"id": "aa674e67-12ab-497d-bf4f-aa0c811b8e64",
193+
"metadata": {},
194+
"source": [
195+
"## Smoothing methods"
196+
]
197+
},
198+
{
199+
"cell_type": "code",
200+
"execution_count": null,
201+
"id": "50c0eb5c-acb1-4099-8856-aafeadb90719",
202+
"metadata": {},
203+
"outputs": [],
204+
"source": []
205+
},
206+
{
207+
"cell_type": "code",
208+
"execution_count": null,
209+
"id": "e8d7b8b3-1680-4d53-ae30-4804f79d2868",
210+
"metadata": {},
211+
"outputs": [],
212+
"source": []
213+
},
214+
{
215+
"cell_type": "code",
216+
"execution_count": null,
217+
"id": "915fd2f8-fb4a-4937-aac0-09a36bef5785",
218+
"metadata": {},
219+
"outputs": [],
220+
"source": []
221+
},
222+
{
223+
"cell_type": "code",
224+
"execution_count": null,
225+
"id": "363ad686-aaf7-427d-9362-87cd366317c6",
226+
"metadata": {},
227+
"outputs": [],
228+
"source": []
229+
},
230+
{
231+
"cell_type": "code",
232+
"execution_count": null,
233+
"id": "8a31ece2-cbf6-499a-9cff-ae9061a08b56",
234+
"metadata": {},
235+
"outputs": [],
236+
"source": []
237+
},
238+
{
239+
"cell_type": "code",
240+
"execution_count": null,
241+
"id": "899076c1-dd9b-43f9-80a8-9111d06f3626",
242+
"metadata": {},
243+
"outputs": [],
244+
"source": []
245+
}
246+
],
247+
"metadata": {
248+
"kernelspec": {
249+
"display_name": "Python 3 (ipykernel)",
250+
"language": "python",
251+
"name": "python3"
252+
},
253+
"language_info": {
254+
"codemirror_mode": {
255+
"name": "ipython",
256+
"version": 3
257+
},
258+
"file_extension": ".py",
259+
"mimetype": "text/x-python",
260+
"name": "python",
261+
"nbconvert_exporter": "python",
262+
"pygments_lexer": "ipython3",
263+
"version": "3.14.2"
264+
}
265+
},
266+
"nbformat": 4,
267+
"nbformat_minor": 5
268+
}

ch1.ipynb

Lines changed: 33 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -83,21 +83,33 @@
8383
"## Influenza-like illness\n",
8484
"\n",
8585
"The Centers for Disease Control and Prevention collect a dataset about influenza-like illness,or ILI.\n",
86-
"ILI is a CCXZXXZXZXZ. \n",
86+
"ILI is a non-specific syndrome defined as fever and cough and/or sore throat. It is used for flu surveillance worldwide. ILI can be caused by influenza virus infection and infections with other respiratory viruses.\n",
8787
"\n"
8888
]
8989
},
9090
{
9191
"cell_type": "code",
92-
"execution_count": null,
92+
"execution_count": 9,
9393
"id": "3f585c25-8d3e-43ee-a3b7-05f642670fcb",
9494
"metadata": {},
95-
"outputs": [],
95+
"outputs": [
96+
{
97+
"ename": "ModuleNotFoundError",
98+
"evalue": "No module named 'pandas'",
99+
"output_type": "error",
100+
"traceback": [
101+
"\u001b[31m---------------------------------------------------------------------------\u001b[39m",
102+
"\u001b[31mModuleNotFoundError\u001b[39m Traceback (most recent call last)",
103+
"\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[9]\u001b[39m\u001b[32m, line 2\u001b[39m\n\u001b[32m 1\u001b[39m \u001b[38;5;66;03m#--d \u001b[39;00m\n\u001b[32m----> \u001b[39m\u001b[32m2\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mpandas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mpd\u001b[39;00m \n\u001b[32m 4\u001b[39m d = pd.read_csv(\u001b[33m\"\u001b[39m\u001b[33m./data/ili_data.csv\u001b[39m\u001b[33m\"\u001b[39m) \u001b[38;5;66;03m#<--using pandas to import a datset\u001b[39;00m\n\u001b[32m 6\u001b[39m \u001b[38;5;66;03m# plot time series for two state\u001b[39;00m\n\u001b[32m 7\u001b[39m \u001b[38;5;66;03m#x is weeks\u001b[39;00m\n\u001b[32m 8\u001b[39m \u001b[38;5;66;03m#y is percent ili (column_name = wILI)\u001b[39;00m\n",
104+
"\u001b[31mModuleNotFoundError\u001b[39m: No module named 'pandas'"
105+
]
106+
}
107+
],
96108
"source": [
97109
"#--d \n",
98110
"import pandas as pd \n",
99111
"\n",
100-
"d = pd.read_csv(\"./data/XXXXXXXX\") #<--using pandas to import a datset\n",
112+
"d = pd.read_csv(\"./data/ili_data.csv\") #<--using pandas to import a datset\n",
101113
"\n",
102114
"# plot time series for two state\n",
103115
"#x is weeks\n",
@@ -122,21 +134,33 @@
122134
"source": [
123135
"## COVID Community mobility\n",
124136
"\n",
125-
"Describe describe describe\n",
137+
"COVID Community Mobility Reports aim to provide insights into what changed in response to policies aimed at combating COVID-19. The reports charted movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential.\n",
126138
"\n"
127139
]
128140
},
129141
{
130142
"cell_type": "code",
131-
"execution_count": null,
143+
"execution_count": 1,
132144
"id": "3959d067-a105-4f97-b917-faa514116f36",
133145
"metadata": {},
134-
"outputs": [],
146+
"outputs": [
147+
{
148+
"ename": "ModuleNotFoundError",
149+
"evalue": "No module named 'pandas'",
150+
"output_type": "error",
151+
"traceback": [
152+
"\u001b[31m---------------------------------------------------------------------------\u001b[39m",
153+
"\u001b[31mModuleNotFoundError\u001b[39m Traceback (most recent call last)",
154+
"\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[1]\u001b[39m\u001b[32m, line 2\u001b[39m\n\u001b[32m 1\u001b[39m \u001b[38;5;66;03m#--d \u001b[39;00m\n\u001b[32m----> \u001b[39m\u001b[32m2\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mpandas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mpd\u001b[39;00m \n\u001b[32m 4\u001b[39m d = pd.read_csv(\u001b[33m\"\u001b[39m\u001b[33m./data/pa_covid.csv\u001b[39m\u001b[33m\"\u001b[39m)\n\u001b[32m 6\u001b[39m \u001b[38;5;66;03m# a plot of one county time seires for two activities\u001b[39;00m\n\u001b[32m 7\u001b[39m \n\u001b[32m 8\u001b[39m \u001b[38;5;66;03m# x is the day \u001b[39;00m\n\u001b[32m 9\u001b[39m \u001b[38;5;66;03m# y - parks_percent_change_from_baseline (<-for example)\u001b[39;00m\n",
155+
"\u001b[31mModuleNotFoundError\u001b[39m: No module named 'pandas'"
156+
]
157+
}
158+
],
135159
"source": [
136160
"#--d \n",
137161
"import pandas as pd \n",
138162
"\n",
139-
"d = pd.read_csv(\"./data/XXXXXXXX\")\n",
163+
"d = pd.read_csv(\"./data/pa_covid.csv\")\n",
140164
"\n",
141165
"# a plot of one county time seires for two activities\n",
142166
"\n",
@@ -261,7 +285,7 @@
261285
"name": "python",
262286
"nbconvert_exporter": "python",
263287
"pygments_lexer": "ipython3",
264-
"version": "3.13.3"
288+
"version": "3.14.2"
265289
}
266290
},
267291
"nbformat": 4,
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
import pandas as pd
2+
3+
if __name__ == "__main__":
4+
d1 = pd.read_csv("2020_US_Region_Mobility_Report.csv")
5+
d2 = pd.read_csv("2021_US_Region_Mobility_Report.csv")
6+
d3 = pd.read_csv("2022_US_Region_Mobility_Report.csv")
7+
8+
def combine_covid(imput,output):
9+
d = pd.concat(imput)
10+
pa_covid = d.loc[d.sub_region_1 == "Pennsylvania"]
11+
pa_covid.to_csv(output, index = False)
12+
13+
output = "pa_covid.csv"
14+
combine_covid([d1,d2,d3],output)
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
import pandas as pd
2+
from delphi_epidata import Epidata
3+
4+
res = Epidata.fluview(["nat"], [201501, Epidata.range(201502, 202552)])
5+
6+
d = pd.DataFrame(res["epidata"])
7+
d.to_csv("ili_data.csv",index=False)

0 commit comments

Comments
 (0)