Skip to content

Commit 778c931

Browse files
author
Falk Mielke
committed
emd: water level examples
1 parent 9bb1f5a commit 778c931

3 files changed

Lines changed: 5374 additions & 19 deletions

File tree

content/tutorials/empirical_mode_decomposition/empirical_mode_decomposition.qmd

Lines changed: 269 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,13 @@
11
---
22
title: "Empirical Mode Decomposition to Analyze Water Levels"
33
author: "Falk Mielke"
4-
date: "2024-12-31"
4+
date: "2024-01-12"
55
format:
66
html:
77
toc: true
88
html-math-method: katex
99
---
1010

11-
TODO: application to real water levels (`watina`)
12-
1311

1412
# Introduction
1513

@@ -238,7 +236,7 @@ In consequence, it is less feature-rich than `scipy`, yet still the best peak fi
238236

239237
```{python py-scipy-find-peaks}
240238
#| label: fig-peaks-py
241-
#| fig-cap: "Peak detection using `scipy.signal.find_peaks()` wiht a prominence of `0.5`. Note that the first and last sample were manually appended to avoid end effects."
239+
#| fig-cap: "Peak detection using `scipy.signal.find_peaks()` with a prominence of `0.5`. Note that the first and last sample were manually appended to avoid end effects."
242240
243241
peaks = NP.append(SIG.find_peaks(y, prominence = 0.5)[0], len(x)-1)
244242
troughs = NP.append(0,SIG.find_peaks(-y, prominence = 0.5)[0])
@@ -487,8 +485,8 @@ extract_firstmode <- function (signal, t = NULL, ...) {
487485
}
488486
489487
## 1. find peaks
490-
peaks <- sort(pracma::findpeaks(y, ...)[, 2])
491-
troughs <- sort(pracma::findpeaks(-y, ...)[, 2])
488+
peaks <- sort(pracma::findpeaks(signal, ...)[, 2])
489+
troughs <- sort(pracma::findpeaks(-signal, ...)[, 2])
492490
extrema <- sort(c(peaks, troughs))
493491
494492
if (peaks[1] < troughs[1]) {
@@ -500,15 +498,15 @@ extract_firstmode <- function (signal, t = NULL, ...) {
500498
}
501499
502500
## 2. interpolate -> envelope
503-
select <- x >= x[max(c(peaks[1], troughs[1]))] & x <= x[min(c(peaks[length(peaks)], troughs[length(troughs)]))]
504-
xi <- x[select]
505-
yi <- y[select]
506-
py <- interpolators::evalInterpolator(interpolators::iprPCHIP(x[peaks], y[peaks]), xi)
507-
ty <- interpolators::evalInterpolator(interpolators::iprPCHIP(x[troughs], y[troughs]), xi)
501+
select <- t >= t[max(c(peaks[1], troughs[1]))] & t <= t[min(c(peaks[length(peaks)], troughs[length(troughs)]))]
502+
ti <- as.numeric(t[select])
503+
yi <- as.numeric(signal[select])
504+
py <- interpolators::evalInterpolator(interpolators::iprPCHIP(t[peaks], as.numeric(signal[peaks])), ti)
505+
ty <- interpolators::evalInterpolator(interpolators::iprPCHIP(t[troughs], as.numeric(signal[troughs])), ti)
508506
509507
## 3. empirical mode
510508
envelope <- cbind("peaks" = py, "troughs" = ty)
511-
rownames(envelope) <- xi
509+
rownames(envelope) <- ti
512510
firstmode <- rowMeans(envelope)
513511
# residual <- yi - firstmode
514512
# amplitude <- abs(py-ty)/2
@@ -592,7 +590,7 @@ They are defined as follows:
592590

593591
> Gemiddelde van de drie laagste|hoogste grondwaterstanden in een hydrologisch jaar (1 april t/m 31 maart) bij een meetfrequentie van tweemaal per maand (rond de 14e en 28e).
594592
595-
[^ritzema2012]: Ritzema *et al.* (2012): "Meten en interpreteren van grondwaterstanden: analyse van methodieken en nauwkeurigheid". Alterra-rapport, Wageningen University & Research. https://edepot.wur.nl/215081
593+
[^ritzema2012]: Ritzema *et al.* (2012): "Meten en interpreteren van grondwaterstanden: analyse van methodieken en nauwkeurigheid". Alterra-rapport, Wageningen University & Research. <https://edepot.wur.nl/215081>
596594

597595

598596
This is literally how they are calculated.
@@ -672,7 +670,7 @@ walk_randomly <- function(start = 0,
672670
This is what the walks look like:
673671

674672
```{r r-do-the-walk}
675-
#| label: fig-randomwalks-r
673+
#| label: fig-randomwalkemds-r
676674
#| fig-cap: "Random walks as simulated water level measurements. Colored lines are the walks, circles indicate bi-weekly samples, horizontal lines mark LG3 and HG3."
677675
678676
sampling_interval <- 14
@@ -817,6 +815,8 @@ Let us see what the empirical mode of a random walk looks like, and how it might
817815
### R
818816

819817
```{r r-random-emd}
818+
#| label: fig-randomwalks-r
819+
#| fig-cap: "EMD of random walks: effectively smoothing the data."
820820
821821
par(mfrow = c(1, 1))
822822
skip <- 32
@@ -843,6 +843,8 @@ for(i in 1:n){
843843
### Python
844844

845845
```{python py-random-emd}
846+
#| label: fig-randomwalkemds-py
847+
#| fig-cap: "EMD of random walks: effectively smoothing the data."
846848
847849
fig, ax = PLT.subplots(1, 1)
848850
skip = 32
@@ -887,28 +889,276 @@ Next example: real data.
887889

888890

889891
# Example III: Water Levels
892+
<!-- Note: the water level examples are `ZUVP031X` and `NEIP001X`. -->
890893

894+
## Water Levels
891895

892-
(TODO)
896+
Our institute assembles data from various observation wells, storing them in [a database](https://watina.inbo.be).
893897

894-
895-
# Archive
898+
Two example water level traces shall use as a test case for EMD.
896899

897900

898901
::: {.panel-tabset group="language"}
899902
### R
900903

901-
```{r }
904+
In R, repeated application of the EMD function above does not work.
905+
906+
However, the code here provides a pointer at how to use it.
907+
908+
909+
```{r r-load-water}
910+
#| label: fig-waterlevel-r
911+
#| fig-cap: "A water level measurement."
912+
913+
wata <- read.csv2("water_level_example_1.csv", sep = ",", dec = ".")
914+
t <- as.Date(wata$'t')
915+
w <- wata$'w'
916+
917+
plot(t, w, type = 'o')
918+
```
919+
920+
921+
```{r r-emd-wata}
922+
#| eval: true
923+
#| label: fig-waterlevel-emd-r
924+
#| fig-cap: "EMD of a water level measurement."
925+
926+
wemd <- extract_firstmode(as.numeric(w))
927+
928+
fm_t <- t[wemd$select]
929+
fm_w <- as.numeric(wemd$firstmode)
930+
residual <- w[wemd$select] - fm_w
931+
fm_e <- wemd$extrema
932+
933+
ggplot(NULL, aes(x = t, y = w)) +
934+
geom_line(color = "darkgray", alpha = 0.6, lwd = 0.5) +
935+
geom_line(aes(x = fm_t, y = fm_w), color = "black") +
936+
geom_point(aes(x = t[fm_e], y = w[fm_e]), color = "orange", size = 3, alpha = 0.4) +
937+
theme_bw()
938+
902939
903940
```
904941

942+
R does not find the peaks reliably (e.g. the first minimum), and there are consecutive peaks/troughs, which is a pity.
943+
Otherwise, the residual of this first EMD iteration would be "noise", and one coul proceed to EMD on the first mode.
944+
945+
::: {.callout-note}
946+
The available `Python` tools are more versatile, and I recommend switching to "Python" at this point.
947+
Even if you are not experienced in that language, the remainder of this tutorial are more conceptual considerations, and will be understandable.
948+
:::
949+
950+
905951
### Python
906952

907-
```{python }
953+
```{python py-load-water}
954+
#| label: fig-waterlevel-py
955+
#| fig-cap: "A water level measurement. The mean water level is indicated by the dashed horizontal line; note the asymmetry of the signal."
956+
957+
data = PD.read_csv("water_level_example_1.csv")
958+
data["t"] = PD.to_datetime(data['t'])
959+
t = data["t"].values.ravel()
960+
w = data["w"].values
961+
962+
PLT.plot(t, w, lw = 0.5, color = "k");
963+
PLT.axhline(NP.mean(w), zorder = 0, color = "grey", linewidth = 0.5, linestyle = "--")
964+
```
965+
966+
DRY = don't repeat yourself...
967+
Make another function for the EMD step!
968+
969+
```{python py-emd-wata}
970+
#| label: fig-waterlevel-emd-py
971+
#| fig-cap: "EMD of a water level measurement."
972+
973+
def EMDStep(t, w, **peak_kwargs):
974+
mode, env, peaks = ExtractFirstmode(w, **peak_kwargs)
975+
res = w - mode
976+
amp = NP.abs(NP.diff(env, axis = 1)/2)
977+
978+
fig, axes = PLT.subplots(2, 1)
979+
ax = axes[0]
980+
ax.plot(t, w, lw = 0.5, color = 'k', zorder = 0, alpha = 0.4)
981+
ax.plot(t, mode, lw = 1.0, color = 'darkgreen', zorder = 20)
982+
ax.scatter(t[peaks], w[peaks], s = 8, facecolor = "none", edgecolor = "k", alpha = 0.3, zorder = 10)
983+
ax.axhline(NP.mean(w), zorder = 0, color = "grey", linewidth = 0.5, linestyle = "--")
984+
ax.spines[["left", "top", "right"]].set_visible(False)
985+
ax.set_ylabel("water level");
986+
ax.get_xaxis().set_visible(False)
987+
988+
ax = axes[1]
989+
ax.plot(t, res, lw = 0.5, color = 'k', zorder = 0, alpha = 0.4)
990+
ax.plot(t, amp,
991+
lw = 0.5, color = "darkred", label = "peak amplitude", alpha = 0.3)
992+
ax.plot(t, -amp,
993+
lw = 0.5, color = "darkred", label = None, alpha = 0.3)
994+
ax.axhline(0, zorder = 0, color = "grey", linewidth = 0.5)
995+
ax.spines[["left", "top", "right"]].set_visible(False)
996+
ax.set_xlabel("date");
997+
ax.set_ylabel("residual");
998+
999+
return(fig, mode)
1000+
1001+
fig, mode = EMDStep(t, w);
1002+
PLT.show();
1003+
1004+
```
1005+
1006+
1007+
This is repeatable, both on the mode itself, as well as on the residual:
1008+
1009+
```{python py-emd-wata-mode}
1010+
#| eval: true
1011+
#| label: fig-waterlevel-emd-mode-py
1012+
#| fig-cap: "Water level measurement, EMD of the first mode."
1013+
1014+
fig, _ = EMDStep(t, mode);
1015+
PLT.show();
1016+
1017+
```
1018+
1019+
```{python py-emd-wata-residual}
1020+
#| eval: true
1021+
#| label: fig-waterlevel-emd2-py
1022+
#| fig-cap: "Water level measurement, the second mode is the EMD of the residual."
1023+
1024+
EMDStep(t, w - mode);
1025+
PLT.show();
1026+
1027+
```
1028+
1029+
:::
1030+
1031+
Unguided peak detection *should* find each tiny local peak, therefore initially extracting white noise where it is present.
1032+
1033+
1034+
## Guided Mode Search
1035+
1036+
Experimenting with the peak detection parameters is worth a try.
1037+
(Because this works much better in Python, I will omit the R code here.)
1038+
1039+
1040+
First, search the long-term oscillations, by setting `prominence`, `width`, or `distance`.
1041+
In this special case, `width` will exclude the local minima: the lower peaks seem to be rather narrow on this water hole.
1042+
Instead, a combination of distance (more than half a year; remember that peaks and troughs are found separately) and prominence (*meaningful* peak) will find and smoothen the yearly baseline.
1043+
1044+
1045+
```{python py-emd-wata-topdown1}
1046+
#| eval: true
1047+
#| label: fig-waterlevel-emd-topdown1
1048+
#| fig-cap: "Exemplifying 'bottom-up' emd, first smoothing the obvious yearly oscillations"
1049+
1050+
# try: prominence, distance, width
1051+
fig, mode = EMDStep(t, w, distance = 200, prominence = 0.20);
1052+
residual = w - mode
1053+
PLT.show();
1054+
1055+
```
1056+
1057+
1058+
Another obvious component is the fine-grained noise; again, we get it by setting no restrictions on the peak detection.
1059+
1060+
```{python py-emd-wata-topdown2}
1061+
#| eval: true
1062+
#| label: fig-waterlevel-emd-topdown2
1063+
#| fig-cap: "Another possible step: narrowest peaks."
1064+
1065+
fig, mode2 = EMDStep(t, residual);
1066+
PLT.show();
9081067
9091068
```
9101069

1070+
The remainder is a straightened, de-noised signal which could be used for more standardized analysis.
1071+
1072+
1073+
```{python py-emd-wata-topdown3}
1074+
#| eval: true
1075+
#| label: fig-waterlevel-emd-topdown3
1076+
#| fig-cap: "The residual, after two different modes were extracted."
1077+
1078+
residual2 = residual - mode2
1079+
PLT.plot(t, w - NP.mean(w), lw = 0.5, color = 'k', zorder = 0, alpha = 0.4);
1080+
PLT.plot(t, mode2 - NP.mean(mode2), lw = 1.0, color = 'darkgreen', zorder = 20);
1081+
PLT.axhline(0, color = "k", lw = 0.5, zorder = -1);
1082+
PLT.show();
1083+
1084+
```
1085+
1086+
There are some obvious problems in this:
1087+
1088+
- edge effects
1089+
- wet summer ~2015: pronounced "dry" minimum was lacking; water levels around that year are dragged towards the zero
1090+
- peak chopping: some peaks, e.g. mid-2010 and 2016, are lost by smoothing
1091+
1092+
1093+
Nevertheless, there is another valuable outcome of EMD: we get the lower and upper **envelope**!
1094+
1095+
```{python py-wateremd-envelope}
1096+
#| label: fig-water-envelope-py
1097+
#| fig-cap: "The envelope of a water level measurement can be used as a continuous measure of minimum and maximum water levels; envelope average marks a continuous middle of water level range."
1098+
mode, env, peaks = ExtractFirstmode(w, distance = 200, prominence = 0.2)
1099+
PLT.plot(t, w, lw = 0.5, color = 'k', zorder = 0, alpha = 1.0);
1100+
PLT.plot(t, env[:,0],
1101+
lw = 0.5, color = "darkred", label = "envelope", alpha = 0.6);
1102+
PLT.plot(t, env[:,1],
1103+
lw = 0.5, color = "darkred", label = None, alpha = 0.6);
1104+
PLT.plot(t, NP.mean(env, axis = 1),
1105+
lw = 1.0, color = "darkgreen", label = "first mode", alpha = 0.6);
1106+
PLT.axhline(NP.mean(w), color = "k", lw = 0.5, zorder = -1);
1107+
PLT.show();
1108+
1109+
```
1110+
1111+
1112+
1113+
1114+
Note that there is no guarantee that the same parameters will work on every observation in your data set.
1115+
In my experience, a "guided" (top-down or bottom-up) EMD approach requires a lot of fiddling with the parameters, possibly even case distinction.
1116+
1117+
1118+
```{python py-load-water2}
1119+
#| label: fig-waterlevel2-py
1120+
#| fig-cap: "Another water level measurement, lacking the obvious yearly oscillations, will not work with the previous, year-focused peak detection parameters. You could smooth it, though, with default EMD settings (not shown)."
1121+
1122+
data2 = PD.read_csv("water_level_example_2.csv")
1123+
data2["t"] = PD.to_datetime(data2['t'])
1124+
t2 = data2["t"].values.ravel()
1125+
w2 = data2["w"].values
1126+
1127+
1128+
_, _ = EMDStep(t2, w2, distance = 200, prominence = 0.20);
1129+
PLT.show();
1130+
1131+
```
1132+
1133+
1134+
1135+
## Summary: Water Level EMD
1136+
1137+
::: {.callout-note}
1138+
Some observations:
1139+
1140+
- "guided EMD": yearly oscillations are found first by selecting for peak characteristics
1141+
- noise can be extracted either before or after; EMD is one way of smoothing a signal
1142+
- EMD has some caveats on water level measurements, where oscillation are not necessarily regular
1143+
- EMD extracts the **envelope**, **first mode**, and **residual**, all of which can occasionally be useful for further analysis
1144+
9111145
:::
9121146

9131147

1148+
One goal of my application of EMD was to find a better way to extract a yearly range of water levels, to replace the anachronistic `LG3` and `HG3` calculations.
1149+
With plain application of EMD (i.e. no pre-processing of the data), the envelope seems to be a promising aspect for further inspection.
1150+
1151+
The reason that water levels turned out to be non-ideal for EMD application is that they lack regular oscillations.
1152+
They are not symmetric (winter wet plateau; summer dry dip), and not necessarily regular (e.g. "wet summer").
1153+
Generally, water level measurements such as the first example above might be more usefully approached with [wavelets](https://docs.scipy.org/doc/scipy-1.12.0/reference/signal.html#wavelets) to find the summer minima.
1154+
However, this is just my [almost-uneducated guess](http://mielke-bio.info/falk/posts/27.cycle_extraction/#orgac5a9c6), based on the visual form of the curves.
1155+
CWT (Continuous Wavelet Transform, [e.g. in R](https://www.rdocumentation.org/packages/Rwave/versions/2.6-5/topics/cwt)) is a great subject for another tutorial.
1156+
1157+
1158+
As a reminder, there are many tools to consider for signal analysis.
1159+
I hope this tutorial could help to bring EMD to your personal repertoire.
1160+
1161+
1162+
Thank you for reading!
1163+
As always, feedback and suggestions are welcome.
9141164

0 commit comments

Comments
 (0)