You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/tutorials/empirical_mode_decomposition/empirical_mode_decomposition.qmd
+72-36Lines changed: 72 additions & 36 deletions
Original file line number
Diff line number
Diff line change
@@ -22,13 +22,25 @@ Irregular changes, varying amplitudes, and changing frequencies can result in de
22
22
23
23
A less widespread method is **Empirical Mode Decomposition**.
24
24
It is, as the name suggests, an empirical method which can help to separate meaningful components of a signal.
25
+
26
+
::: {.callout-tip}
27
+
EMD can in principle be used to separate the following components of an oscillating signal:
28
+
29
+
- envelope and momentary amplitude
30
+
- noise
31
+
- different oscillation modes
32
+
33
+
:::
34
+
35
+
25
36
The steps involved are trivial, as I will demonstrate in this tutorial.
37
+
For most part, code is available in Python and R, though the Python implementation is somewhat more comprehensive.
38
+
26
39
27
40
::: {.panel-tabset group="language"}
28
41
### R
29
42
30
43
```{r r-setup}
31
-
.libPaths("/data/R/library")
32
44
suppressMessages(library("dplyr"))
33
45
suppressMessages(library("ggplot2"))
34
46
suppressMessages(library("interpolators"))
@@ -66,7 +78,7 @@ import matplotlib.pyplot as PLT
66
78
67
79
## gape angle data
68
80
69
-
For exploration, take this brief episode of beak gape angle of a canary cracking hemp seeds, courtesy of [Maja Mielke](https://orcid.org/0000-0001-6328-0589) ([*website*](http://mielke-bio.info/maja/blog))
81
+
For exploration, take this brief episode of beak gape angle of a canary cracking hemp seeds, courtesy of [Maja Mielke](https://orcid.org/0000-0001-6328-0589) ([*website*](http://mielke-bio.info/maja/blog)).
70
82
71
83
72
84
::: {.panel-tabset group="language"}
@@ -132,8 +144,8 @@ ShowPlot()
132
144
133
145
:::
134
146
135
-
This bird was in ["positioning" and "biting" phase](http://mielke-bio.info/maja/blog/01_eating_or_being_eaten), placing a seed in the right position for cracking with the help of their upper beak, lower beak and tongue.
136
-
On the x axis of [@fig-data-py]/[@fig-data-r], you see the time, measured in video frames and normalized.
147
+
This bird was in ["positioning" and "biting" phase](http://mielke-bio.info/maja/blog/01_eating_or_being_eaten), placing a hemp seed in the right position for cracking with the help of their upper beak, lower beak and tongue.
148
+
On the x axis of [@fig-data-py]/[@fig-data-r], you see the time (available in video frames, or normalized).
137
149
On the y axis, gape angle indicates (approximately) the angle between the beak tips and corner[^1].
138
150
139
151
[^1]: "Beak corner" is not the real reference, I just used it for illustration. In fact, an anatomical coordinate reference system was used.
@@ -145,19 +157,20 @@ You can clearly see an initial episode with large gape angle oscillations, and a
145
157
146
158
## Basic Algorithm
147
159
148
-
Engineers often talk about "peak amplitude" and quantify a **momentary amplitude** of a noticably oscillating signal ([see here](https://www.keysight.com/used/be/en/knowledge/guides/how-to-measure-amplitude-engineers-guide)).
160
+
Engineers often talk about "peak amplitude" (as in: amplitude along the peaks) and quantify a **momentary amplitude** of a noticably oscillating signal ([see here](https://www.keysight.com/used/be/en/knowledge/guides/how-to-measure-amplitude-engineers-guide)).
149
161
150
162
This goes by the framework of ["Hilbert Transform"](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.hilbert.html#scipy.signal.hilbert) and ["Analytical Signal"](https://en.wikipedia.org/wiki/Analytic_signal).
163
+
All fascinating, yet we will here stick with the practical implementation.
151
164
152
165
153
-
And to get those, one has to first detect the **peaks** (i.e. local maxima) of the signal.
154
-
[The`emd` toolbox in Python](https://emd.readthedocs.io/en/stable/stubs/emd.sift.interp_envelope.html) can achieve this.
155
-
(I did not find the equivalent function in the `R::emd` library.)
166
+
To find the peak amplitude of the signal, one obviously has to detect the **peaks** (i.e. local maxima) of the signal.
167
+
Just to get a reference, [the`emd` toolbox in Python](https://emd.readthedocs.io/en/stable/stubs/emd.sift.interp_envelope.html) can achieve this.
168
+
(I did not find the equivalent function in the `R::emd` library, but no worries, this just serves as reference.)
156
169
157
170
158
171
```{python py-emd-extrema}
159
172
#| label: fig-emd-py
160
-
#| fig-cap: "Empirical mode decomposition involves finding and averaging the envelope. Here, this is done with the `emd` library and default settings."
173
+
#| fig-cap: "Empirical mode decomposition involves finding and averaging the envelope. Here, this is done in Python with the `emd` library and default settings."
161
174
162
175
163
176
envelope = NP.stack(
@@ -198,11 +211,11 @@ Note that these wiggles do not matter much for the actual EMD procedure.
198
211
199
212
## Peak Detection: Prominence!
200
213
201
-
While the `EMD` toolbox is quite comprehensive, it might be useful to get our hands on the simple steps outlined above.
214
+
While the `EMD` toolbox is quite comprehensive, we will have much more control by getting our hands on the individual steps outlined above.
202
215
203
216
The first one is **peak detection**.
204
217
Scipy holds a default function for it: [`scipy.signal.find_peaks()`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks.html).
205
-
Non-comprehensive equivalents in R are [`signal::findpeaks()` and `pracma::findpeaks()`](https://search.r-project.org/CRAN/refmans/pracma/html/findpeaks.html), and there is [`gsignal`](https://cran.r-project.org/web/packages/gsignal/vignettes/gsignal.html).
218
+
Non-comprehensive equivalents in R are `signal::findpeaks()` and [`pracma::findpeaks()`](https://search.r-project.org/CRAN/refmans/pracma/html/findpeaks.html), and there is [`gsignal`](https://cran.r-project.org/web/packages/gsignal/vignettes/gsignal.html).
206
219
207
220
208
221
The functions only find the local *maxima*, but that is no challenge (just flip the signal to get minima).
@@ -259,14 +272,14 @@ The `scipy` function is quite elaborate and feature-rich.
259
272
260
273
::: { .callout-note }
261
274
262
-
-You could also use the `threshold`, `distance`/`minpeakdistance `, and `width` arguments instead of `prominence`, if you have meaningful priors for either of those.
263
-
-You should double-check that every peak follows a trough (though not strictly necessary, you may skip one or two).
264
-
- It remains to be demonstrated how this performs on less oscillatory trials.
275
+
-Depending on the toolbox of choice, parameters such as `threshold`, `distance`/`minpeakdistance `, `width`, and `prominence` can be used to refine peak detection in case there are meaningful priors for either of those.
276
+
-It is recommended to double-check that every peak follows a trough (though not strictly necessary, brave and daring users may skip one or two). Generally, best plot and inspect all signals with putative peaks to detect errors.
277
+
- It remains to be demonstrated how this step performs on less oscillatory trials.
265
278
266
279
:::
267
280
268
281
269
-
Make sure to do something about your `NaN`'s: `find_peaks` does not like them.
282
+
Make sure to do something about your `NA` and `NaN` values: `find_peaks` does not like them.
270
283
And keep an eye on those edges (i.e. start and end).
271
284
272
285
@@ -345,12 +358,20 @@ Keep in mind that the first- and last sample were appended to the lists of peaks
345
358
346
359
Interpolation in R is quite rudimentary: I miss the option to extrapolate, and one to fix the actual values.
347
360
I did not find a package for [RBF interpolation](https://en.wikipedia.org/wiki/Radial_basis_function_interpolation).
348
-
Python is quite accurate and versatile in terms of interpolation: [`scipy.interpolate`](https://docs.scipy.org/doc/scipy/tutorial/interpolate.html) is feature rich and can do [RBF](https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.RBFInterpolator.html#scipy.interpolate.RBFInterpolator).
361
+
Python is quite accurate and versatile in terms of interpolation: [`scipy.interpolate`](https://docs.scipy.org/doc/scipy/tutorial/interpolate.html) is feature rich and can do [RBF](https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.RBFInterpolator.html#scipy.interpolate.RBFInterpolator) (not shown).
349
362
350
363
351
364
352
365
## Mode Extraction and Residual
353
366
367
+
One last step: take the upper and lower peak interpolation.
368
+
This is the **envelope** of the signal.
369
+
Its difference is the **momentary amplitude**.
370
+
The mean of upper and lower envelope line is an **empirical mode**.
371
+
372
+
This is less a coding exercise, and more of vocabulary practice.
373
+
374
+
354
375
355
376
::: {.panel-tabset group="language"}
356
377
### R
@@ -440,7 +461,8 @@ The *first mode*:
440
461
441
462
- captures a rather long-term change of beak opening, by extracting something close to the mean of the angular range in which the canary beak move
442
463
- is smooth, compared to the raw signal
443
-
- shows jump-like and wiggly behavior.
464
+
- yet still shows jump-like and wiggly behavior.
465
+
444
466
445
467
The *residual*:
446
468
@@ -457,7 +479,7 @@ You might find this too canary-centric.
457
479
I will now attempt to generailze the procedure by applying it to (i) random walks and (ii) real groundwater level data.
458
480
459
481
460
-
But, first things first: for the purpose of conceptual generalization, a more general function might be useful.
482
+
But, first things first: for the purpose of conceptual generalization, a more general function will be useful.
461
483
462
484
463
485
::: {.panel-tabset group="language"}
@@ -579,7 +601,7 @@ My goal is to slightly improve water level analysis, for the following reason.
579
601
580
602
581
603
Conventional water level analysis is subject to a few anachronisms.
582
-
In *the good old days*$^{TM}, long before the time of GPS, automatic data loggers, or computers, people went to the field in spring to gather bi-weekly measures of water level from observation wells.
604
+
In *the good old days*™, long before the time of GPS, automatic data loggers, or computers, people went to the field in spring to gather bi-weekly measures of water level from observation wells.
583
605
584
606
What they were really interested in were approximate measures of highest and lowest ground water, so-called `xG3` values.
585
607
"Approximate", because measurement frequency was limited: a bi-weekly rhythm was about as good as we could get with actual humans putting yardsticks into holes in the ground.
@@ -593,15 +615,18 @@ They are defined as follows:
593
615
[^ritzema2012]: Ritzema *et al.* (2012): "Meten en interpreteren van grondwaterstanden: analyse van methodieken en nauwkeurigheid". Alterra-rapport, Wageningen University & Research. <https://edepot.wur.nl/215081>
594
616
595
617
596
-
This is literally how they are calculated.
597
-
The mean of the three lowest/highest ground water levels measured in bi-weekly measurement interval, over a period from April to next March.
618
+
This is literally how they are calculated:
619
+
the mean of the three lowest/highest ground water levels measured in bi-weekly measurement interval, over a period from April to next March.
598
620
599
621
600
622
The anachronism is that we still calculate `xG3` like this, despite the availability of high frequency sampled water levels.
601
623
We artificially pick bi-weekly values.
602
624
We disregard the continuous, temporally periodic nature of the phenomenon.
625
+
We choose an arbitrary sampling cadence.
603
626
We ignore measurement uncertainty.
604
627
628
+
Knowing what was demonstrated above with the momentary amplitude, we could try to do better!
629
+
605
630
606
631
Let us first get an intuition of what `xG3` values look like, by looking at random walk data.
607
632
@@ -877,10 +902,12 @@ These would quickly lead to over-smoothing of the traces.
877
902
878
903
::: {.callout-note}
879
904
On random walk data, EMD seems to achieve little more than smoothing.
880
-
881
905
The reason is that there is no regular oscillation in the data.
882
906
883
-
Lesson learned: EMD is especially useful if the data contains more-or-less regular oscillations.
907
+
The *kind* of smoothing is interesting: EMD naturally captures small oscillations, which in many cases are considered "white noise".
908
+
909
+
910
+
Note taken: EMD is especially useful if the data contains more-or-less regular oscillations.
884
911
:::
885
912
886
913
@@ -895,7 +922,7 @@ Next example: real data.
895
922
896
923
Our institute assembles data from various observation wells, storing them in [a database](https://watina.inbo.be).
897
924
898
-
Two example water level traces shall use as a test case for EMD.
925
+
Two example water level traces shall serve as a test case for EMD.
R does not find the peaks reliably (e.g. the first minimum), and there are consecutive peaks/troughs, which is a pity.
943
-
Otherwise, the residual of this first EMD iteration would be "noise", and one coul proceed to EMD on the first mode.
969
+
R does not find all peaks reliably (e.g. see the first minimum), and there are consecutive peaks/troughs, which is a pity.
970
+
This means that relevant parts of the signal are captured on the residual.
971
+
Normally, the residual of this first EMD iteration would be "noise", and one could proceed to EMD on the first mode.
944
972
945
973
::: {.callout-note}
946
974
The available `Python` tools are more versatile, and I recommend switching to "Python" at this point.
@@ -1028,7 +1056,7 @@ PLT.show();
1028
1056
1029
1057
:::
1030
1058
1031
-
Unguided peak detection *should* find each tiny local peak, therefore initially extracting white noise where it is present.
1059
+
As with the random walks above, unguided peak detection *should* find each tiny local peak, therefore initially extracting white noise where it is present.
1032
1060
1033
1061
1034
1062
## Guided Mode Search
@@ -1039,7 +1067,8 @@ Experimenting with the peak detection parameters is worth a try.
1039
1067
1040
1068
First, search the long-term oscillations, by setting `prominence`, `width`, or `distance`.
1041
1069
In this special case, `width` will exclude the local minima: the lower peaks seem to be rather narrow on this water hole.
1042
-
Instead, a combination of distance (more than half a year; remember that peaks and troughs are found separately) and prominence (*meaningful* peak) will find and smoothen the yearly baseline.
1070
+
If this is an issue, remember that you can control peak finding for maxima and minima separately.
1071
+
However, a combination of distance (more than half a year; remember that peaks and troughs are found separately) and prominence (*meaningful* peak) did the job to find the yearly baseline and straighten out the data (which might not be what we want).
1043
1072
1044
1073
1045
1074
```{python py-emd-wata-topdown1}
@@ -1067,7 +1096,7 @@ PLT.show();
1067
1096
1068
1097
```
1069
1098
1070
-
The remainder is a straightened, de-noised signal which could be used for more standardized analysis.
1099
+
The remainder after these two hand-selected EMD iterations is a straightened, de-noised signal which could be used for more standardized analysis.
1071
1100
1072
1101
1073
1102
```{python py-emd-wata-topdown3}
@@ -1076,9 +1105,12 @@ The remainder is a straightened, de-noised signal which could be used for more s
1076
1105
#| fig-cap: "The residual, after two different modes were extracted."
1077
1106
1078
1107
residual2 = residual - mode2
1079
-
PLT.plot(t, w - NP.mean(w), lw = 0.5, color = 'k', zorder = 0, alpha = 0.4);
PLT.axhline(0, color = "k", lw = 0.5, zorder = -1);
1113
+
PLT.gca().legend(loc = "best");
1082
1114
PLT.show();
1083
1115
1084
1116
```
@@ -1111,7 +1143,7 @@ PLT.show();
1111
1143
1112
1144
1113
1145
1114
-
Note that there is no guarantee that the same parameters will work on every observation in your data set.
1146
+
Note that there is no guarantee that the same guiding parameters will work on every observation in your data set.
1115
1147
In my experience, a "guided" (top-down or bottom-up) EMD approach requires a lot of fiddling with the parameters, possibly even case distinction.
1116
1148
1117
1149
@@ -1130,16 +1162,20 @@ PLT.show();
1130
1162
1131
1163
```
1132
1164
1165
+
However, these peak detection controls are unavailable in R, so we might better stick with the defaults anyways (which usually work).
1133
1166
1134
1167
1135
1168
## Summary: Water Level EMD
1136
1169
1137
1170
::: {.callout-note}
1138
-
Some observations:
1171
+
To "wrap up" (envelope-pun), some observations:
1139
1172
1140
1173
- "guided EMD": yearly oscillations are found first by selecting for peak characteristics
1141
-
- noise can be extracted either before or after; EMD is one way of smoothing a signal
1142
-
- EMD has some caveats on water level measurements, where oscillation are not necessarily regular
1174
+
- noise can be extracted either before or after; EMD is one way of naturally smoothing a signal
1175
+
- EMD has some caveats on water level measurements, where oscillation are not necessarily regular or symmetric
1176
+
1177
+
And the most important take-home message:
1178
+
1143
1179
- EMD extracts the **envelope**, **first mode**, and **residual**, all of which can occasionally be useful for further analysis
1144
1180
1145
1181
:::
@@ -1151,7 +1187,7 @@ With plain application of EMD (i.e. no pre-processing of the data), the envelope
1151
1187
The reason that water levels turned out to be non-ideal for EMD application is that they lack regular oscillations.
1152
1188
They are not symmetric (winter wet plateau; summer dry dip), and not necessarily regular (e.g. "wet summer").
1153
1189
Generally, water level measurements such as the first example above might be more usefully approached with [wavelets](https://docs.scipy.org/doc/scipy-1.12.0/reference/signal.html#wavelets) to find the summer minima.
1154
-
However, this is just my [almost-uneducated guess](http://mielke-bio.info/falk/posts/27.cycle_extraction/#orgac5a9c6), based on the visual form of the curves.
1190
+
This is just my [almost-uneducated guess](http://mielke-bio.info/falk/posts/27.cycle_extraction/#orgac5a9c6), based on the visual form of the curves.
1155
1191
CWT (Continuous Wavelet Transform, [e.g. in R](https://www.rdocumentation.org/packages/Rwave/versions/2.6-5/topics/cwt)) is a great subject for another tutorial.
0 commit comments