Skip to content

Commit ab1b664

Browse files
committed
emd: re-run and proofread*
- python venv guide added - text improvements - from the need to apply this to actual data
1 parent 778c931 commit ab1b664

3 files changed

Lines changed: 76 additions & 38 deletions

File tree

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,3 +66,5 @@ data_gisclub/
6666
# png images for the brms tutorial
6767
/content/tutorials/r_brms/brms_eng/*.png
6868
/content/tutorials/r_brms/brms_nl/*.png
69+
70+
/.quarto/

content/tutorials/empirical_mode_decomposition/empirical_mode_decomposition.qmd

Lines changed: 72 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,25 @@ Irregular changes, varying amplitudes, and changing frequencies can result in de
2222

2323
A less widespread method is **Empirical Mode Decomposition**.
2424
It is, as the name suggests, an empirical method which can help to separate meaningful components of a signal.
25+
26+
::: {.callout-tip}
27+
EMD can in principle be used to separate the following components of an oscillating signal:
28+
29+
- envelope and momentary amplitude
30+
- noise
31+
- different oscillation modes
32+
33+
:::
34+
35+
2536
The steps involved are trivial, as I will demonstrate in this tutorial.
37+
For most part, code is available in Python and R, though the Python implementation is somewhat more comprehensive.
38+
2639

2740
::: {.panel-tabset group="language"}
2841
### R
2942

3043
```{r r-setup}
31-
.libPaths("/data/R/library")
3244
suppressMessages(library("dplyr"))
3345
suppressMessages(library("ggplot2"))
3446
suppressMessages(library("interpolators"))
@@ -66,7 +78,7 @@ import matplotlib.pyplot as PLT
6678

6779
## gape angle data
6880

69-
For exploration, take this brief episode of beak gape angle of a canary cracking hemp seeds, courtesy of [Maja Mielke](https://orcid.org/0000-0001-6328-0589) ([*website*](http://mielke-bio.info/maja/blog))
81+
For exploration, take this brief episode of beak gape angle of a canary cracking hemp seeds, courtesy of [Maja Mielke](https://orcid.org/0000-0001-6328-0589) ([*website*](http://mielke-bio.info/maja/blog)).
7082

7183

7284
::: {.panel-tabset group="language"}
@@ -132,8 +144,8 @@ ShowPlot()
132144

133145
:::
134146

135-
This bird was in ["positioning" and "biting" phase](http://mielke-bio.info/maja/blog/01_eating_or_being_eaten), placing a seed in the right position for cracking with the help of their upper beak, lower beak and tongue.
136-
On the x axis of [@fig-data-py]/[@fig-data-r], you see the time, measured in video frames and normalized.
147+
This bird was in ["positioning" and "biting" phase](http://mielke-bio.info/maja/blog/01_eating_or_being_eaten), placing a hemp seed in the right position for cracking with the help of their upper beak, lower beak and tongue.
148+
On the x axis of [@fig-data-py]/[@fig-data-r], you see the time (available in video frames, or normalized).
137149
On the y axis, gape angle indicates (approximately) the angle between the beak tips and corner[^1].
138150

139151
[^1]: "Beak corner" is not the real reference, I just used it for illustration. In fact, an anatomical coordinate reference system was used.
@@ -145,19 +157,20 @@ You can clearly see an initial episode with large gape angle oscillations, and a
145157

146158
## Basic Algorithm
147159

148-
Engineers often talk about "peak amplitude" and quantify a **momentary amplitude** of a noticably oscillating signal ([see here](https://www.keysight.com/used/be/en/knowledge/guides/how-to-measure-amplitude-engineers-guide)).
160+
Engineers often talk about "peak amplitude" (as in: amplitude along the peaks) and quantify a **momentary amplitude** of a noticably oscillating signal ([see here](https://www.keysight.com/used/be/en/knowledge/guides/how-to-measure-amplitude-engineers-guide)).
149161

150162
This goes by the framework of ["Hilbert Transform"](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.hilbert.html#scipy.signal.hilbert) and ["Analytical Signal"](https://en.wikipedia.org/wiki/Analytic_signal).
163+
All fascinating, yet we will here stick with the practical implementation.
151164

152165

153-
And to get those, one has to first detect the **peaks** (i.e. local maxima) of the signal.
154-
[The `emd` toolbox in Python](https://emd.readthedocs.io/en/stable/stubs/emd.sift.interp_envelope.html) can achieve this.
155-
(I did not find the equivalent function in the `R::emd` library.)
166+
To find the peak amplitude of the signal, one obviously has to detect the **peaks** (i.e. local maxima) of the signal.
167+
Just to get a reference, [the `emd` toolbox in Python](https://emd.readthedocs.io/en/stable/stubs/emd.sift.interp_envelope.html) can achieve this.
168+
(I did not find the equivalent function in the `R::emd` library, but no worries, this just serves as reference.)
156169

157170

158171
```{python py-emd-extrema}
159172
#| label: fig-emd-py
160-
#| fig-cap: "Empirical mode decomposition involves finding and averaging the envelope. Here, this is done with the `emd` library and default settings."
173+
#| fig-cap: "Empirical mode decomposition involves finding and averaging the envelope. Here, this is done in Python with the `emd` library and default settings."
161174
162175
163176
envelope = NP.stack(
@@ -198,11 +211,11 @@ Note that these wiggles do not matter much for the actual EMD procedure.
198211

199212
## Peak Detection: Prominence!
200213

201-
While the `EMD` toolbox is quite comprehensive, it might be useful to get our hands on the simple steps outlined above.
214+
While the `EMD` toolbox is quite comprehensive, we will have much more control by getting our hands on the individual steps outlined above.
202215

203216
The first one is **peak detection**.
204217
Scipy holds a default function for it: [`scipy.signal.find_peaks()`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks.html).
205-
Non-comprehensive equivalents in R are [`signal::findpeaks()` and `pracma::findpeaks()`](https://search.r-project.org/CRAN/refmans/pracma/html/findpeaks.html), and there is [`gsignal`](https://cran.r-project.org/web/packages/gsignal/vignettes/gsignal.html).
218+
Non-comprehensive equivalents in R are `signal::findpeaks()` and [`pracma::findpeaks()`](https://search.r-project.org/CRAN/refmans/pracma/html/findpeaks.html), and there is [`gsignal`](https://cran.r-project.org/web/packages/gsignal/vignettes/gsignal.html).
206219

207220

208221
The functions only find the local *maxima*, but that is no challenge (just flip the signal to get minima).
@@ -259,14 +272,14 @@ The `scipy` function is quite elaborate and feature-rich.
259272

260273
::: { .callout-note }
261274

262-
- You could also use the `threshold`, `distance`/`minpeakdistance `, and `width` arguments instead of `prominence`, if you have meaningful priors for either of those.
263-
- You should double-check that every peak follows a trough (though not strictly necessary, you may skip one or two).
264-
- It remains to be demonstrated how this performs on less oscillatory trials.
275+
- Depending on the toolbox of choice, parameters such as `threshold`, `distance`/`minpeakdistance `, `width`, and `prominence` can be used to refine peak detection in case there are meaningful priors for either of those.
276+
- It is recommended to double-check that every peak follows a trough (though not strictly necessary, brave and daring users may skip one or two). Generally, best plot and inspect all signals with putative peaks to detect errors.
277+
- It remains to be demonstrated how this step performs on less oscillatory trials.
265278

266279
:::
267280

268281

269-
Make sure to do something about your `NaN`'s: `find_peaks` does not like them.
282+
Make sure to do something about your `NA` and `NaN` values: `find_peaks` does not like them.
270283
And keep an eye on those edges (i.e. start and end).
271284

272285

@@ -345,12 +358,20 @@ Keep in mind that the first- and last sample were appended to the lists of peaks
345358

346359
Interpolation in R is quite rudimentary: I miss the option to extrapolate, and one to fix the actual values.
347360
I did not find a package for [RBF interpolation](https://en.wikipedia.org/wiki/Radial_basis_function_interpolation).
348-
Python is quite accurate and versatile in terms of interpolation: [`scipy.interpolate`](https://docs.scipy.org/doc/scipy/tutorial/interpolate.html) is feature rich and can do [RBF](https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.RBFInterpolator.html#scipy.interpolate.RBFInterpolator).
361+
Python is quite accurate and versatile in terms of interpolation: [`scipy.interpolate`](https://docs.scipy.org/doc/scipy/tutorial/interpolate.html) is feature rich and can do [RBF](https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.RBFInterpolator.html#scipy.interpolate.RBFInterpolator) (not shown).
349362

350363

351364

352365
## Mode Extraction and Residual
353366

367+
One last step: take the upper and lower peak interpolation.
368+
This is the **envelope** of the signal.
369+
Its difference is the **momentary amplitude**.
370+
The mean of upper and lower envelope line is an **empirical mode**.
371+
372+
This is less a coding exercise, and more of vocabulary practice.
373+
374+
354375

355376
::: {.panel-tabset group="language"}
356377
### R
@@ -440,7 +461,8 @@ The *first mode*:
440461

441462
- captures a rather long-term change of beak opening, by extracting something close to the mean of the angular range in which the canary beak move
442463
- is smooth, compared to the raw signal
443-
- shows jump-like and wiggly behavior.
464+
- yet still shows jump-like and wiggly behavior.
465+
444466

445467
The *residual*:
446468

@@ -457,7 +479,7 @@ You might find this too canary-centric.
457479
I will now attempt to generailze the procedure by applying it to (i) random walks and (ii) real groundwater level data.
458480

459481

460-
But, first things first: for the purpose of conceptual generalization, a more general function might be useful.
482+
But, first things first: for the purpose of conceptual generalization, a more general function will be useful.
461483

462484

463485
::: {.panel-tabset group="language"}
@@ -579,7 +601,7 @@ My goal is to slightly improve water level analysis, for the following reason.
579601

580602

581603
Conventional water level analysis is subject to a few anachronisms.
582-
In *the good old days*$^{TM}, long before the time of GPS, automatic data loggers, or computers, people went to the field in spring to gather bi-weekly measures of water level from observation wells.
604+
In *the good old days*, long before the time of GPS, automatic data loggers, or computers, people went to the field in spring to gather bi-weekly measures of water level from observation wells.
583605

584606
What they were really interested in were approximate measures of highest and lowest ground water, so-called `xG3` values.
585607
"Approximate", because measurement frequency was limited: a bi-weekly rhythm was about as good as we could get with actual humans putting yardsticks into holes in the ground.
@@ -593,15 +615,18 @@ They are defined as follows:
593615
[^ritzema2012]: Ritzema *et al.* (2012): "Meten en interpreteren van grondwaterstanden: analyse van methodieken en nauwkeurigheid". Alterra-rapport, Wageningen University & Research. <https://edepot.wur.nl/215081>
594616

595617

596-
This is literally how they are calculated.
597-
The mean of the three lowest/highest ground water levels measured in bi-weekly measurement interval, over a period from April to next March.
618+
This is literally how they are calculated:
619+
the mean of the three lowest/highest ground water levels measured in bi-weekly measurement interval, over a period from April to next March.
598620

599621

600622
The anachronism is that we still calculate `xG3` like this, despite the availability of high frequency sampled water levels.
601623
We artificially pick bi-weekly values.
602624
We disregard the continuous, temporally periodic nature of the phenomenon.
625+
We choose an arbitrary sampling cadence.
603626
We ignore measurement uncertainty.
604627

628+
Knowing what was demonstrated above with the momentary amplitude, we could try to do better!
629+
605630

606631
Let us first get an intuition of what `xG3` values look like, by looking at random walk data.
607632

@@ -877,10 +902,12 @@ These would quickly lead to over-smoothing of the traces.
877902

878903
::: {.callout-note}
879904
On random walk data, EMD seems to achieve little more than smoothing.
880-
881905
The reason is that there is no regular oscillation in the data.
882906

883-
Lesson learned: EMD is especially useful if the data contains more-or-less regular oscillations.
907+
The *kind* of smoothing is interesting: EMD naturally captures small oscillations, which in many cases are considered "white noise".
908+
909+
910+
Note taken: EMD is especially useful if the data contains more-or-less regular oscillations.
884911
:::
885912

886913

@@ -895,7 +922,7 @@ Next example: real data.
895922

896923
Our institute assembles data from various observation wells, storing them in [a database](https://watina.inbo.be).
897924

898-
Two example water level traces shall use as a test case for EMD.
925+
Two example water level traces shall serve as a test case for EMD.
899926

900927

901928
::: {.panel-tabset group="language"}
@@ -939,8 +966,9 @@ ggplot(NULL, aes(x = t, y = w)) +
939966
940967
```
941968

942-
R does not find the peaks reliably (e.g. the first minimum), and there are consecutive peaks/troughs, which is a pity.
943-
Otherwise, the residual of this first EMD iteration would be "noise", and one coul proceed to EMD on the first mode.
969+
R does not find all peaks reliably (e.g. see the first minimum), and there are consecutive peaks/troughs, which is a pity.
970+
This means that relevant parts of the signal are captured on the residual.
971+
Normally, the residual of this first EMD iteration would be "noise", and one could proceed to EMD on the first mode.
944972

945973
::: {.callout-note}
946974
The available `Python` tools are more versatile, and I recommend switching to "Python" at this point.
@@ -1028,7 +1056,7 @@ PLT.show();
10281056

10291057
:::
10301058

1031-
Unguided peak detection *should* find each tiny local peak, therefore initially extracting white noise where it is present.
1059+
As with the random walks above, unguided peak detection *should* find each tiny local peak, therefore initially extracting white noise where it is present.
10321060

10331061

10341062
## Guided Mode Search
@@ -1039,7 +1067,8 @@ Experimenting with the peak detection parameters is worth a try.
10391067

10401068
First, search the long-term oscillations, by setting `prominence`, `width`, or `distance`.
10411069
In this special case, `width` will exclude the local minima: the lower peaks seem to be rather narrow on this water hole.
1042-
Instead, a combination of distance (more than half a year; remember that peaks and troughs are found separately) and prominence (*meaningful* peak) will find and smoothen the yearly baseline.
1070+
If this is an issue, remember that you can control peak finding for maxima and minima separately.
1071+
However, a combination of distance (more than half a year; remember that peaks and troughs are found separately) and prominence (*meaningful* peak) did the job to find the yearly baseline and straighten out the data (which might not be what we want).
10431072

10441073

10451074
```{python py-emd-wata-topdown1}
@@ -1067,7 +1096,7 @@ PLT.show();
10671096
10681097
```
10691098

1070-
The remainder is a straightened, de-noised signal which could be used for more standardized analysis.
1099+
The remainder after these two hand-selected EMD iterations is a straightened, de-noised signal which could be used for more standardized analysis.
10711100

10721101

10731102
```{python py-emd-wata-topdown3}
@@ -1076,9 +1105,12 @@ The remainder is a straightened, de-noised signal which could be used for more s
10761105
#| fig-cap: "The residual, after two different modes were extracted."
10771106
10781107
residual2 = residual - mode2
1079-
PLT.plot(t, w - NP.mean(w), lw = 0.5, color = 'k', zorder = 0, alpha = 0.4);
1080-
PLT.plot(t, mode2 - NP.mean(mode2), lw = 1.0, color = 'darkgreen', zorder = 20);
1108+
PLT.plot(t, w - NP.mean(w), lw = 0.5, color = 'k', zorder = 0, alpha = 0.4,
1109+
label = "raw signal");
1110+
PLT.plot(t, mode2 - NP.mean(mode2), lw = 1.0, color = 'darkgreen', zorder = 20,
1111+
label = "residual after 2 EMD steps");
10811112
PLT.axhline(0, color = "k", lw = 0.5, zorder = -1);
1113+
PLT.gca().legend(loc = "best");
10821114
PLT.show();
10831115
10841116
```
@@ -1111,7 +1143,7 @@ PLT.show();
11111143

11121144

11131145

1114-
Note that there is no guarantee that the same parameters will work on every observation in your data set.
1146+
Note that there is no guarantee that the same guiding parameters will work on every observation in your data set.
11151147
In my experience, a "guided" (top-down or bottom-up) EMD approach requires a lot of fiddling with the parameters, possibly even case distinction.
11161148

11171149

@@ -1130,16 +1162,20 @@ PLT.show();
11301162
11311163
```
11321164

1165+
However, these peak detection controls are unavailable in R, so we might better stick with the defaults anyways (which usually work).
11331166

11341167

11351168
## Summary: Water Level EMD
11361169

11371170
::: {.callout-note}
1138-
Some observations:
1171+
To "wrap up" (envelope-pun), some observations:
11391172

11401173
- "guided EMD": yearly oscillations are found first by selecting for peak characteristics
1141-
- noise can be extracted either before or after; EMD is one way of smoothing a signal
1142-
- EMD has some caveats on water level measurements, where oscillation are not necessarily regular
1174+
- noise can be extracted either before or after; EMD is one way of naturally smoothing a signal
1175+
- EMD has some caveats on water level measurements, where oscillation are not necessarily regular or symmetric
1176+
1177+
And the most important take-home message:
1178+
11431179
- EMD extracts the **envelope**, **first mode**, and **residual**, all of which can occasionally be useful for further analysis
11441180

11451181
:::
@@ -1151,7 +1187,7 @@ With plain application of EMD (i.e. no pre-processing of the data), the envelope
11511187
The reason that water levels turned out to be non-ideal for EMD application is that they lack regular oscillations.
11521188
They are not symmetric (winter wet plateau; summer dry dip), and not necessarily regular (e.g. "wet summer").
11531189
Generally, water level measurements such as the first example above might be more usefully approached with [wavelets](https://docs.scipy.org/doc/scipy-1.12.0/reference/signal.html#wavelets) to find the summer minima.
1154-
However, this is just my [almost-uneducated guess](http://mielke-bio.info/falk/posts/27.cycle_extraction/#orgac5a9c6), based on the visual form of the curves.
1190+
This is just my [almost-uneducated guess](http://mielke-bio.info/falk/posts/27.cycle_extraction/#orgac5a9c6), based on the visual form of the curves.
11551191
CWT (Continuous Wavelet Transform, [e.g. in R](https://www.rdocumentation.org/packages/Rwave/versions/2.6-5/topics/cwt)) is a great subject for another tutorial.
11561192

11571193

content/tutorials/empirical_mode_decomposition/requirements.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ jupyterlab_widgets==3.0.13
5555
kiwisolver==1.4.7
5656
lazy_loader==0.4
5757
librosa==0.10.2.post1
58-
llvmlite==0.43.0
58+
llvmlite==0.44.0rc2
5959
MarkupSafe==3.0.2
6060
matplotlib==3.10.0
6161
matplotlib-inline==0.1.7
@@ -67,7 +67,7 @@ nbformat==5.10.4
6767
nest-asyncio==1.6.0
6868
notebook==7.3.1
6969
notebook_shim==0.2.4
70-
numba==0.60.0
70+
numba==0.61.0rc2
7171
numpy==2.0.2
7272
overrides==7.7.0
7373
packaging==24.2

0 commit comments

Comments
 (0)