Skip to content

Commit 9221e3f

Browse files
committed
update
1 parent 42ce0a5 commit 9221e3f

96 files changed

Lines changed: 10458 additions & 6854 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
3.08 KB
Binary file not shown.
21.1 KB
Binary file not shown.

doc/LectureNotes/_build/html/_sources/week47.ipynb

Lines changed: 342 additions & 145 deletions
Large diffs are not rendered by default.

doc/LectureNotes/_build/html/searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

doc/LectureNotes/_build/html/week47.html

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -448,6 +448,9 @@ <h2> Contents </h2>
448448
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gating-mechanism-long-short-term-memory-lstm">Gating mechanism: Long Short Term Memory (LSTM)</a></li>
449449
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#implementing-a-memory-cell-in-a-neural-network">Implementing a memory cell in a neural network</a></li>
450450
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lstm-details">LSTM details</a></li>
451+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lstm-cell-and-gates">LSTM Cell and Gates</a></li>
452+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#core-lstm-equations">Core LSTM Equations</a></li>
453+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gate-intuition-and-dynamics">Gate Intuition and Dynamics</a></li>
451454
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#basic-layout-all-figures-from-raschka-et-al">Basic layout (All figures from Raschka <em>et al.,</em>)</a></li>
452455
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id1">LSTM details</a></li>
453456
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#comparing-with-a-standard-rnn">Comparing with a standard RNN</a></li>
@@ -462,6 +465,11 @@ <h2> Contents </h2>
462465
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#forget-and-input">Forget and input</a></li>
463466
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id2">Basic layout</a></li>
464467
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#output-gate">Output gate</a></li>
468+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lstm-implementation-code-example">LSTM Implementation (Code Example)</a></li>
469+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-modeling-dynamical-systems">Example: Modeling Dynamical Systems</a></li>
470+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-biological-sequences">Example: Biological Sequences</a></li>
471+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#training-tips-and-variants">Training Tips and Variants</a></li>
472+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lstm-summary">LSTM Summary</a></li>
465473
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#summary-of-lstm">Summary of LSTM</a></li>
466474
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lstm-implementation-using-tensorflow">LSTM implementation using TensorFlow</a></li>
467475
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#and-the-corresponding-one-with-pytorch">And the corresponding one with PyTorch</a></li>
@@ -728,6 +736,7 @@ <h2>PyTorch: Defining a Simple RNN, using Tensorflow<a class="headerlink" href="
728736
<p>This recurrent neural network uses the TensorFlow/Keras SimpleRNN, which is the counterpart to PyTorch’s nn.RNN.
729737
In this code we have used</p>
730738
<ol class="arabic simple">
739+
<li><p>sequence<span class="math notranslate nohighlight">\(\_\)</span>length is the number of time steps in each input sequence fed into a recurrent neural network. It represents how many time points we provide at once. It is the number of ordered observations in each sample of our dataset.</p></li>
731740
<li><p>return_sequences=False makes it output only the last hidden state, which is fed to the classifier. Also, we have</p></li>
732741
<li><p>from_logits=True matches the PyTorch CrossEntropyLoss.</p></li>
733742
</ol>
@@ -1083,6 +1092,44 @@ <h2>LSTM details<a class="headerlink" href="#lstm-details" title="Link to this h
10831092
long-term memory, and a hidden state <span class="math notranslate nohighlight">\(h\)</span> which can be thought of as
10841093
the short-term memory.</p>
10851094
</section>
1095+
<section id="lstm-cell-and-gates">
1096+
<h2>LSTM Cell and Gates<a class="headerlink" href="#lstm-cell-and-gates" title="Link to this heading">#</a></h2>
1097+
<ol class="arabic simple">
1098+
<li><p>Each LSTM cell contains a memory cell <span class="math notranslate nohighlight">\(C_t\)</span> and three gates (forget <span class="math notranslate nohighlight">\(f_t\)</span>, input <span class="math notranslate nohighlight">\(i_t\)</span>, output <span class="math notranslate nohighlight">\(o_t\)</span>) that control information flow.</p></li>
1099+
<li><p><strong>Forget gate</strong> (<span class="math notranslate nohighlight">\(f_t\)</span>): chooses which information to erase from the previous cell state <span class="math notranslate nohighlight">\(C_{t-1}\)</span></p></li>
1100+
<li><p><strong>Input gate</strong> (<span class="math notranslate nohighlight">\(i_t\)</span>): decides which new information <span class="math notranslate nohighlight">\(\tilde{C}_t\)</span> to add to the cell state.</p></li>
1101+
<li><p><strong>Output gate</strong> (<span class="math notranslate nohighlight">\(o_t\)</span>): controls which parts of the cell state become the output <span class="math notranslate nohighlight">\(h_t\)</span>.</p></li>
1102+
<li><p>The cell state update: <span class="math notranslate nohighlight">\(C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t\)</span></p></li>
1103+
</ol>
1104+
</section>
1105+
<section id="core-lstm-equations">
1106+
<h2>Core LSTM Equations<a class="headerlink" href="#core-lstm-equations" title="Link to this heading">#</a></h2>
1107+
<p><strong>The gate computations and state updates are given by:</strong></p>
1108+
<div class="math notranslate nohighlight">
1109+
\[\begin{split}
1110+
\begin{align*}
1111+
f_t &amp;= \sigma(W_f [h_{t-1}, x_t] + b_f), \\
1112+
i_t &amp;= \sigma(W_i [h_{t-1}, x_t] + b_i), \\
1113+
\tilde{C}_t &amp;= \tanh(W_C [h_{t-1}, x_t] + b_C), \\
1114+
C_t &amp;= f_t \odot C_{t-1} + i_t \odot \tilde{C}_t, \\
1115+
o_t &amp;= \sigma(W_o [h_{t-1}, x_t] + b_o), \\
1116+
h_t &amp;= o_t \odot \tanh(C_t).
1117+
\end{align*}
1118+
\end{split}\]</div>
1119+
<ol class="arabic simple">
1120+
<li><p><span class="math notranslate nohighlight">\(\sigma\)</span> is the sigmoid function, <span class="math notranslate nohighlight">\(\odot\)</span> is elementwise product <a class="reference external" href="https://jaketae.github.io/study/dissecting-lstm/#:~:text=%5C%5B%5Cbegin,align">oai_citation:4‡jaketae.github.io</a>.</p></li>
1121+
<li><p>These equations define how LSTM retains/updates memory and produces outputs.</p></li>
1122+
</ol>
1123+
</section>
1124+
<section id="gate-intuition-and-dynamics">
1125+
<h2>Gate Intuition and Dynamics<a class="headerlink" href="#gate-intuition-and-dynamics" title="Link to this heading">#</a></h2>
1126+
<ol class="arabic simple">
1127+
<li><p>Forget gate <span class="math notranslate nohighlight">\(f_t\)</span> acts as a soft “erase” signal: <span class="math notranslate nohighlight">\(f_t \approx 0\)</span> forgets, <span class="math notranslate nohighlight">\(f_t \approx 1\)</span> retains previous memory.</p></li>
1128+
<li><p>Input gate <span class="math notranslate nohighlight">\(i_t\)</span> scales how much new candidate memory <span class="math notranslate nohighlight">\(\tilde{C}_t\)</span> is written.</p></li>
1129+
<li><p>Output gate <span class="math notranslate nohighlight">\(o_t\)</span> determines how much of the cell’s memory flows into the hidden state <span class="math notranslate nohighlight">\(h_t\)</span>.</p></li>
1130+
<li><p>By controlling these gates, LSTM effectively keeps long-term information when needed.</p></li>
1131+
</ol>
1132+
</section>
10861133
<section id="basic-layout-all-figures-from-raschka-et-al">
10871134
<h2>Basic layout (All figures from Raschka <em>et al.,</em>)<a class="headerlink" href="#basic-layout-all-figures-from-raschka-et-al" title="Link to this heading">#</a></h2>
10881135
<!-- dom:FIGURE: [figslides/LSTM1.png, width=700 frac=1.0] -->
@@ -1213,6 +1260,68 @@ <h2>Output gate<a class="headerlink" href="#output-gate" title="Link to this hea
12131260
\end{split}\]</div>
12141261
<p>where <span class="math notranslate nohighlight">\(\mathbf{W_o,U_o}\)</span> are the weights of the output gate and <span class="math notranslate nohighlight">\(\mathbf{b_o}\)</span> is the bias of the output gate.</p>
12151262
</section>
1263+
<section id="lstm-implementation-code-example">
1264+
<h2>LSTM Implementation (Code Example)<a class="headerlink" href="#lstm-implementation-code-example" title="Link to this heading">#</a></h2>
1265+
<ol class="arabic simple">
1266+
<li><p>Using high-level libraries (Keras, PyTorch) simplifies LSTM usage.</p></li>
1267+
<li><p>define and train a Keras LSTM on a univariate time series:</p></li>
1268+
</ol>
1269+
<div class="cell docutils container">
1270+
<div class="cell_input docutils container">
1271+
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from tensorflow.keras.models import Sequential
1272+
from tensorflow.keras.layers import LSTM, Dense
1273+
1274+
# X_train shape: (samples, timesteps, 1)
1275+
model = Sequential([
1276+
LSTM(32, input_shape=(None, 1)),
1277+
Dense(1)
1278+
])
1279+
model.compile(optimizer=&#39;adam&#39;, loss=&#39;mse&#39;)
1280+
model.fit(X_train, y_train, epochs=20, batch_size=16)
1281+
</pre></div>
1282+
</div>
1283+
</div>
1284+
</div>
1285+
<p>The model learns to map sequences to outputs; input sequences can be constructed via sliding windows.</p>
1286+
</section>
1287+
<section id="example-modeling-dynamical-systems">
1288+
<h2>Example: Modeling Dynamical Systems<a class="headerlink" href="#example-modeling-dynamical-systems" title="Link to this heading">#</a></h2>
1289+
<ol class="arabic simple">
1290+
<li><p>LSTMs can learn complex time evolution of physical systems (e.g. Lorenz attractor, fluid dynamics) from data.</p></li>
1291+
<li><p>Serve as data-driven surrogates for ODE/PDE solvers (trained on RK4-generated time series).</p></li>
1292+
<li><p>For example, an LSTM surrogate accurately forecast 36h lake hydrodynamics (velocity, temperature) with <span class="math notranslate nohighlight">\(&lt;6\%\)</span> error.</p></li>
1293+
<li><p>Such models dramatically speed up predictions compared to full numerical simulation.</p></li>
1294+
</ol>
1295+
</section>
1296+
<section id="example-biological-sequences">
1297+
<h2>Example: Biological Sequences<a class="headerlink" href="#example-biological-sequences" title="Link to this heading">#</a></h2>
1298+
<ol class="arabic simple">
1299+
<li><p>Biological sequences (DNA/RNA/proteins) are effectively categorical time series.</p></li>
1300+
<li><p>LSTMs capture sequence motifs and long-range dependencies (akin to language models).</p></li>
1301+
<li><p>Widely used in genomics and proteomics (e.g., protein function, gene expression).</p></li>
1302+
<li><p>They naturally handle variable-length input by processing one element at a time.</p></li>
1303+
</ol>
1304+
</section>
1305+
<section id="training-tips-and-variants">
1306+
<h2>Training Tips and Variants<a class="headerlink" href="#training-tips-and-variants" title="Link to this heading">#</a></h2>
1307+
<ol class="arabic simple">
1308+
<li><p>Preprocess time series (normalize features, windowing); handle variable lengths (padding/truncation).</p></li>
1309+
<li><p>Experiment with network depth, hidden units, and regularization (dropout) to avoid overfitting.</p></li>
1310+
<li><p>Consider bidirectional LSTM or stacking multiple LSTM layers for complex patterns.</p></li>
1311+
<li><p>GRU is a simpler gated RNN that combines forget/input gates into one update gate.</p></li>
1312+
<li><p>Monitor gradients during training; use gradient clipping to stabilize learning if needed.</p></li>
1313+
</ol>
1314+
</section>
1315+
<section id="lstm-summary">
1316+
<h2>LSTM Summary<a class="headerlink" href="#lstm-summary" title="Link to this heading">#</a></h2>
1317+
<ol class="arabic simple">
1318+
<li><p>LSTMs extend RNNs with gated cells to remember long-term context, addressing RNN gradient issues.</p></li>
1319+
<li><p>Core update: <span class="math notranslate nohighlight">\(C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t\)</span>, output <span class="math notranslate nohighlight">\(h_t = o_t \odot \tanh(C_t)\)</span>.</p></li>
1320+
<li><p>Implementation is straightforward in libraries like Keras/PyTorch with few lines of code.</p></li>
1321+
<li><p>Applications span science and engineering: forecasting dynamical systems, analyzing DNA/proteins, etc.</p></li>
1322+
<li><p>For more details, see Goodfellow et al. (2016) Deep Learning, chapter 14</p></li>
1323+
</ol>
1324+
</section>
12161325
<section id="summary-of-lstm">
12171326
<h2>Summary of LSTM<a class="headerlink" href="#summary-of-lstm" title="Link to this heading">#</a></h2>
12181327
<p>LSTMs provide a basic approach for modeling long-range dependencies in sequences.
@@ -2391,6 +2500,9 @@ <h2>Dimensionality reduction<a class="headerlink" href="#dimensionality-reductio
23912500
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gating-mechanism-long-short-term-memory-lstm">Gating mechanism: Long Short Term Memory (LSTM)</a></li>
23922501
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#implementing-a-memory-cell-in-a-neural-network">Implementing a memory cell in a neural network</a></li>
23932502
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lstm-details">LSTM details</a></li>
2503+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lstm-cell-and-gates">LSTM Cell and Gates</a></li>
2504+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#core-lstm-equations">Core LSTM Equations</a></li>
2505+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gate-intuition-and-dynamics">Gate Intuition and Dynamics</a></li>
23942506
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#basic-layout-all-figures-from-raschka-et-al">Basic layout (All figures from Raschka <em>et al.,</em>)</a></li>
23952507
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id1">LSTM details</a></li>
23962508
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#comparing-with-a-standard-rnn">Comparing with a standard RNN</a></li>
@@ -2405,6 +2517,11 @@ <h2>Dimensionality reduction<a class="headerlink" href="#dimensionality-reductio
24052517
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#forget-and-input">Forget and input</a></li>
24062518
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id2">Basic layout</a></li>
24072519
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#output-gate">Output gate</a></li>
2520+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lstm-implementation-code-example">LSTM Implementation (Code Example)</a></li>
2521+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-modeling-dynamical-systems">Example: Modeling Dynamical Systems</a></li>
2522+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-biological-sequences">Example: Biological Sequences</a></li>
2523+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#training-tips-and-variants">Training Tips and Variants</a></li>
2524+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lstm-summary">LSTM Summary</a></li>
24082525
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#summary-of-lstm">Summary of LSTM</a></li>
24092526
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lstm-implementation-using-tensorflow">LSTM implementation using TensorFlow</a></li>
24102527
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#and-the-corresponding-one-with-pytorch">And the corresponding one with PyTorch</a></li>

0 commit comments

Comments
 (0)