Skip to content

Commit 5518f08

Browse files
Lela BoermeesterLela Boermeester
authored andcommitted
decade of ili
1 parent c915e32 commit 5518f08

5 files changed

Lines changed: 587 additions & 1 deletion

File tree

.DS_Store

0 Bytes
Binary file not shown.

.gitattributes

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
2+
*.ipynb diff=jupyternotebook
3+
4+
*.ipynb merge=jupyternotebook
Lines changed: 268 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,268 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "21a6615c-b9eb-4081-a362-e0bbdf82590a",
6+
"metadata": {},
7+
"source": [
8+
"# Chapter 1"
9+
]
10+
},
11+
{
12+
"cell_type": "markdown",
13+
"id": "4812cc2a-428d-4eee-aa2a-406a4b4d871e",
14+
"metadata": {},
15+
"source": [
16+
"## Time series data versus IID data \n",
17+
"\n",
18+
"A typical setup for statistical analysis assumes that a series of experiments generate observations that are independent and identically distributed~(often abbreviated i.i.d). \n",
19+
"For example, \n",
20+
"\n",
21+
"\\begin{align}\n",
22+
" \\mathcal{D} &= ( y_{1}, y_{2}, \\cdots, y_{n} ) \\\\ \n",
23+
" y_{i} &\\sim \\text{Poisson}(\\lambda)\n",
24+
"\\end{align}\n",
25+
"\n",
26+
"where we use $\\mathcal{D}$ to represent a dataset, lower case letters to represent collected observations, capital letters to represent random variables, and greek letters to represent parameters. \n",
27+
"Because we assume that the above observations were generated from a sequence of i.i.d poisson random variables, we can simplify expressions that incldue the probability of $Y_{1}, Y_{2}, \\cdots$. \n",
28+
"\n",
29+
"\\begin{align}\n",
30+
" P(Y_{1}, Y_{2}) &= P(Y_{1}) \\cdot P(Y_{2}) \\\\ \n",
31+
" P(Y_{1}, Y_{2}, \\cdots, Y_{n}) &= P(Y_{1}) \\cdot P(Y_{2}) \\cdots P(Y_{n}) = \\prod_{i=1}^{n} P(Y_{i}) \\\\ \n",
32+
" & = \\lambda^{\\sum_{i=1}^{n} y_{i} } \\frac{e^{ -n\\lambda }}{ \\prod_{i=1}^{n} y_{i}! } \\propto e^{ -n\\lambda }\\lambda^{\\sum_{i=1}^{n} y_{i} }\n",
33+
"\\end{align}\n",
34+
"\n",
35+
"The expression above is an (often good) approximation of the joint probability of observing all $n$ data points at once. \n",
36+
"Unlike more traditional data collections mechanisms, for time series data we cannot assume that the observations are i.i.d.\n",
37+
"Instead, we assume that observations at time $t$ deoend on all previous random variables before time $t$. \n",
38+
"Then, we cannot simplify the joint probability of the first $t$ random variables as their product. \n",
39+
"\n",
40+
"Recall the multiplication rule \n",
41+
"\n",
42+
"\\begin{align}\n",
43+
" P(A,B,C) &= P( B,C | A ) P(A) \\\\ \n",
44+
" &= P( C | B, A ) P(B|A) P(A)\n",
45+
"\\end{align}\n",
46+
"\n",
47+
"We can still use the multiplication rule to assess the joint probability of a sequence of random variables.\n",
48+
"Lets assume that we wish to model some time seres process from time unit one up until time unit $T$. \n",
49+
"Then we need to estimate probabilities like \n",
50+
"\n",
51+
"\\begin{align}\n",
52+
" P( Y_{1}, Y_{2}, \\cdots, Y_{T} ) = P(Y_{0})\\cdot P(Y_{1} | Y_{0}) \\cdot P(Y_{2} | Y_{1},Y_{0}) \\cdots P(Y_{T} | Y_{T-1} \\cdots Y_{0})\n",
53+
"\\end{align}\n",
54+
"\n",
55+
"The i.i.d assumption simplifies the above by assuming that each random variable is independent of all others. \n",
56+
"For time series, we want to simplify the above but still keep the most important characteristics of the process---that observations in the future depend on the past. \n",
57+
"\n",
58+
"### Markov Assumption \n",
59+
"\n",
60+
"Given a series of random variables, the Markov assumption states that the probability of $Y_{t}$ depends only on the random variable at time $t-1$, or \n",
61+
"\n",
62+
"\\begin{align}\n",
63+
" P(Y_{t} | Y_{t-1}, Y_{t-2}, \\cdots Y_{1}) \\approx P(Y_{t} | Y_{t-1})\n",
64+
"\\end{align}\n",
65+
"\n",
66+
"The markov assumption aims to capture the most basic attribute of a time series, that future values depend on the recent past, without the more restrictive property that future values depend on **all** of the past. \n",
67+
"\n",
68+
"This simplified considerably the above \n",
69+
"\n",
70+
"\\begin{align}\n",
71+
" P( Y_{1}, Y_{2}, \\cdots, Y_{T} ) = P(Y_{0})\\cdot P(Y_{1} | Y_{0}) \\cdot P(Y_{2} | Y_{1},Y_{0}) \\cdots P(Y_{T} | Y_{T-1} \\cdots Y_{0}) \\\\ \n",
72+
" & \\approx P(Y_{0}) \\cdot P(Y_{1} | Y_{0}) \\cdot P(Y_{2} | Y_{1}) \\cdots \n",
73+
"\\end{align}\n",
74+
"\n"
75+
]
76+
},
77+
{
78+
"cell_type": "markdown",
79+
"id": "f2aca56d-50c3-4bd0-a087-81cca8ad4d6b",
80+
"metadata": {},
81+
"source": [
82+
"## Influenza-like illness\n",
83+
"\n",
84+
"The Centers for Disease Control and Prevention collect a dataset about influenza-like illness,or ILI.\n",
85+
"ILI is a CCXZXXZXZXZ. \n",
86+
"\n"
87+
]
88+
},
89+
{
90+
"cell_type": "code",
91+
"execution_count": null,
92+
"id": "3f585c25-8d3e-43ee-a3b7-05f642670fcb",
93+
"metadata": {},
94+
"outputs": [],
95+
"source": [
96+
"#--d \n",
97+
"import pandas as pd \n",
98+
"\n",
99+
"d = pd.read_csv(\"./data/XXXXXXXX\") #<--using pandas to import a datset\n",
100+
"\n",
101+
"# plot time series for two state\n",
102+
"#x is weeks\n",
103+
"#y is percent ili (column_name = wILI)\n"
104+
]
105+
},
106+
{
107+
"cell_type": "code",
108+
"execution_count": null,
109+
"id": "0e4ed68f-6af4-4068-9612-52427e027ecd",
110+
"metadata": {},
111+
"outputs": [],
112+
"source": [
113+
"#--d \n",
114+
"import pandas as pd \n"
115+
]
116+
},
117+
{
118+
"cell_type": "markdown",
119+
"id": "7614c9a3-efc5-4037-b68d-9a9d97bef067",
120+
"metadata": {},
121+
"source": [
122+
"## COVID Community mobility\n",
123+
"\n",
124+
"Describe describe describe\n",
125+
"\n"
126+
]
127+
},
128+
{
129+
"cell_type": "code",
130+
"execution_count": null,
131+
"id": "3959d067-a105-4f97-b917-faa514116f36",
132+
"metadata": {},
133+
"outputs": [],
134+
"source": [
135+
"#--d \n",
136+
"import pandas as pd \n",
137+
"\n",
138+
"d = pd.read_csv(\"./data/XXXXXXXX\")\n",
139+
"\n",
140+
"# a plot of one county time seires for two activities\n",
141+
"\n",
142+
"# x is the day \n",
143+
"# y - parks_percent_change_from_baseline (<-for example)\n"
144+
]
145+
},
146+
{
147+
"cell_type": "markdown",
148+
"id": "0ff18230-d314-4bd8-8033-99c15cd2636d",
149+
"metadata": {},
150+
"source": [
151+
"## Mpox incidence"
152+
]
153+
},
154+
{
155+
"cell_type": "markdown",
156+
"id": "57dfd0e6-9805-4754-b774-738058c1fd2f",
157+
"metadata": {},
158+
"source": [
159+
"## Correlation, Covariance, and the Corrolelogram"
160+
]
161+
},
162+
{
163+
"cell_type": "code",
164+
"execution_count": null,
165+
"id": "29b752f3-aacc-4f33-9e65-5434927cbfaf",
166+
"metadata": {},
167+
"outputs": [],
168+
"source": [
169+
"# FFor ili we will wantt to plot the percent ILI at week t versus the percent ILI at week t+1"
170+
]
171+
},
172+
{
173+
"cell_type": "code",
174+
"execution_count": null,
175+
"id": "40c49a8b-547e-4d27-a64e-b92d6d187980",
176+
"metadata": {},
177+
"outputs": [],
178+
"source": [
179+
"# FFor COVID we will wantt to plot the behaviro at week t versus the behavior at week t+1"
180+
]
181+
},
182+
{
183+
"cell_type": "code",
184+
"execution_count": null,
185+
"id": "9dde1a78-0206-4272-a102-0c2295b6efd3",
186+
"metadata": {},
187+
"outputs": [],
188+
"source": []
189+
},
190+
{
191+
"cell_type": "markdown",
192+
"id": "aa674e67-12ab-497d-bf4f-aa0c811b8e64",
193+
"metadata": {},
194+
"source": [
195+
"## Smoothing methods"
196+
]
197+
},
198+
{
199+
"cell_type": "code",
200+
"execution_count": null,
201+
"id": "50c0eb5c-acb1-4099-8856-aafeadb90719",
202+
"metadata": {},
203+
"outputs": [],
204+
"source": []
205+
},
206+
{
207+
"cell_type": "code",
208+
"execution_count": null,
209+
"id": "e8d7b8b3-1680-4d53-ae30-4804f79d2868",
210+
"metadata": {},
211+
"outputs": [],
212+
"source": []
213+
},
214+
{
215+
"cell_type": "code",
216+
"execution_count": null,
217+
"id": "915fd2f8-fb4a-4937-aac0-09a36bef5785",
218+
"metadata": {},
219+
"outputs": [],
220+
"source": []
221+
},
222+
{
223+
"cell_type": "code",
224+
"execution_count": null,
225+
"id": "363ad686-aaf7-427d-9362-87cd366317c6",
226+
"metadata": {},
227+
"outputs": [],
228+
"source": []
229+
},
230+
{
231+
"cell_type": "code",
232+
"execution_count": null,
233+
"id": "8a31ece2-cbf6-499a-9cff-ae9061a08b56",
234+
"metadata": {},
235+
"outputs": [],
236+
"source": []
237+
},
238+
{
239+
"cell_type": "code",
240+
"execution_count": null,
241+
"id": "899076c1-dd9b-43f9-80a8-9111d06f3626",
242+
"metadata": {},
243+
"outputs": [],
244+
"source": []
245+
}
246+
],
247+
"metadata": {
248+
"kernelspec": {
249+
"display_name": "Python 3 (ipykernel)",
250+
"language": "python",
251+
"name": "python3"
252+
},
253+
"language_info": {
254+
"codemirror_mode": {
255+
"name": "ipython",
256+
"version": 3
257+
},
258+
"file_extension": ".py",
259+
"mimetype": "text/x-python",
260+
"name": "python",
261+
"nbconvert_exporter": "python",
262+
"pygments_lexer": "ipython3",
263+
"version": "3.13.3"
264+
}
265+
},
266+
"nbformat": 4,
267+
"nbformat_minor": 5
268+
}

0 commit comments

Comments
 (0)