Skip to content

Commit 505efbb

Browse files
committed
updating week 13
1 parent f7c6966 commit 505efbb

8 files changed

Lines changed: 123 additions & 1764 deletions

File tree

doc/pub/week13/html/week13-bs.html

Lines changed: 14 additions & 294 deletions
Large diffs are not rendered by default.

doc/pub/week13/html/week13-reveal.html

Lines changed: 11 additions & 271 deletions
Large diffs are not rendered by default.

doc/pub/week13/html/week13-solarized.html

Lines changed: 13 additions & 284 deletions
Large diffs are not rendered by default.

doc/pub/week13/html/week13.html

Lines changed: 13 additions & 284 deletions
Large diffs are not rendered by default.
-19.4 KB
Binary file not shown.

doc/pub/week13/ipynb/week13.ipynb

Lines changed: 64 additions & 406 deletions
Large diffs are not rendered by default.

doc/pub/week13/pdf/week13.pdf

-105 KB
Binary file not shown.

doc/src/week13/week13.do.txt

Lines changed: 8 additions & 225 deletions
Original file line numberDiff line numberDiff line change
@@ -1,247 +1,30 @@
11
TITLE: Advanced machine learning and data analysis for the physical sciences
2-
AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics and Center for Computing in Science Education, University of Oslo, Norway & Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University, East Lansing, Michigan, USA
3-
DATE: April 16, 2024
2+
AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics and Center for Computing in Science Education, University of Oslo, Norway
3+
DATE: April 24, 2025
44

55

66
!split
7-
===== Plans for the week April 15-19, 2024 =====
7+
===== Plans for the week April 21-25, 2025 =====
88

99
!bblock Deep generative models
10-
o Finalizing discussion of Boltzmann machines, implementations using TensorFlow and Pytorch
11-
o Discussion of other energy-based models and Langevin sampling
12-
o Variational Autoencoders (VAE), mathematics
13-
o "Video of lecture":"https://youtu.be/rw-NBN293o4"
14-
o "Whiteboard notes":"https://github.com/CompPhysics/AdvancedMachineLearning/blob/main/doc/HandwrittenNotes/2024/NotesApril16.pdf"
10+
o Variational Autoencoders (VAE), mathematics, basic mathematics
11+
o Writing our own codes for VAEs
12+
#o "Video of lecture":"https://youtu.be/rw-NBN293o4"
13+
#o "Whiteboard notes":"https://github.com/CompPhysics/AdvancedMachineLearning/blob/main/doc/HandwrittenNotes/2024/NotesApril16.pdf"
1514
!eblock
1615

1716
!split
1817
===== Readings =====
1918
!bblock
20-
o Reading recommendation: Goodfellow et al, for VAEs see sections 20.10-20.11
21-
o To create Boltzmann machine using Keras, see Babcock and Bali chapter 4, see URL:"https://github.com/PacktPublishing/Hands-On-Generative-AI-with-Python-and-TensorFlow-2/blob/master/Chapter_4/models/rbm.py"
22-
o See Foster, chapter 7 on energy-based models at URL:"https://github.com/davidADSP/Generative_Deep_Learning_2nd_Edition/tree/main/notebooks/07_ebm/01_ebm"
19+
o Add VAE material
2320
!eblock
2421

2522
#todo: add about Langevin sampling, see https://www.lyndonduong.com/sgmcmc/
2623
# code for VAEs applied to MNIST and CIFAR perhaps
2724

28-
!split
29-
===== Reminder from last week and layout of lecture this week =====
30-
31-
o We will present first a reminder from last week, see for example the jupyter-notebook at URL:"https://github.com/CompPhysics/AdvancedMachineLearning/blob/main/doc/pub/week12/ipynb/week12.ipynb"
32-
o We will then discuss codes as well as other energy-based models and Langevin sampling instead of Gibbs or Metropolis sampling.
33-
o Thereafter we start our discussions of Variational autoencoders and Generalized adversarial networks
34-
35-
!split
36-
===== Code for RBMs using PyTorch for a binary-binary RBM =====
37-
38-
!bc pycod
39-
import numpy as np
40-
import torch
41-
import torch.utils.data
42-
import torch.nn as nn
43-
import torch.nn.functional as F
44-
import torch.optim as optim
45-
from torch.autograd import Variable
46-
from torchvision import datasets, transforms
47-
from torchvision.utils import make_grid , save_image
48-
import matplotlib.pyplot as plt
49-
50-
51-
batch_size = 64
52-
train_loader = torch.utils.data.DataLoader(
53-
datasets.MNIST('./data',
54-
train=True,
55-
download = True,
56-
transform = transforms.Compose(
57-
[transforms.ToTensor()])
58-
),
59-
batch_size=batch_size
60-
)
61-
62-
test_loader = torch.utils.data.DataLoader(
63-
datasets.MNIST('./data',
64-
train=False,
65-
transform=transforms.Compose(
66-
[transforms.ToTensor()])
67-
),
68-
batch_size=batch_size)
69-
70-
71-
class RBM(nn.Module):
72-
def __init__(self,
73-
n_vis=784,
74-
n_hin=500,
75-
k=5):
76-
super(RBM, self).__init__()
77-
self.W = nn.Parameter(torch.randn(n_hin,n_vis)*1e-2)
78-
self.v_bias = nn.Parameter(torch.zeros(n_vis))
79-
self.h_bias = nn.Parameter(torch.zeros(n_hin))
80-
self.k = k
81-
82-
def sample_from_p(self,p):
83-
return F.relu(torch.sign(p - Variable(torch.rand(p.size()))))
84-
85-
def v_to_h(self,v):
86-
p_h = F.sigmoid(F.linear(v,self.W,self.h_bias))
87-
sample_h = self.sample_from_p(p_h)
88-
return p_h,sample_h
89-
90-
def h_to_v(self,h):
91-
p_v = F.sigmoid(F.linear(h,self.W.t(),self.v_bias))
92-
sample_v = self.sample_from_p(p_v)
93-
return p_v,sample_v
94-
95-
def forward(self,v):
96-
pre_h1,h1 = self.v_to_h(v)
97-
98-
h_ = h1
99-
for _ in range(self.k):
100-
pre_v_,v_ = self.h_to_v(h_)
101-
pre_h_,h_ = self.v_to_h(v_)
102-
103-
return v,v_
104-
105-
def free_energy(self,v):
106-
vbias_term = v.mv(self.v_bias)
107-
wx_b = F.linear(v,self.W,self.h_bias)
108-
hidden_term = wx_b.exp().add(1).log().sum(1)
109-
return (-hidden_term - vbias_term).mean()
110-
111-
112-
113-
114-
rbm = RBM(k=1)
115-
train_op = optim.SGD(rbm.parameters(),0.1)
116-
117-
for epoch in range(10):
118-
loss_ = []
119-
for _, (data,target) in enumerate(train_loader):
120-
data = Variable(data.view(-1,784))
121-
sample_data = data.bernoulli()
122-
123-
v,v1 = rbm(sample_data)
124-
loss = rbm.free_energy(v) - rbm.free_energy(v1)
125-
loss_.append(loss.data)
126-
train_op.zero_grad()
127-
loss.backward()
128-
train_op.step()
129-
130-
print("Training loss for {} epoch: {}".format(epoch, np.mean(loss_)))
131-
132-
133-
def show_adn_save(file_name,img):
134-
npimg = np.transpose(img.numpy(),(1,2,0))
135-
f = "./%s.png" % file_name
136-
plt.imshow(npimg)
137-
plt.imsave(f,npimg)
138-
139-
show_adn_save("real",make_grid(v.view(32,1,28,28).data))
140-
show_adn_save("generate",make_grid(v1.view(32,1,28,28).data))
141-
142-
!ec
143-
144-
!split
145-
===== RBM using TensorFlow and Keras =====
146-
147-
148-
o To create Boltzmann machine using Keras, see Babcock and Bali chapter 4, see URL:"https://github.com/PacktPublishing/Hands-On-Generative-AI-with-Python-and-TensorFlow-2/blob/master/Chapter_4/models/rbm.py"
149-
150-
151-
152-
!split
153-
===== Codes for Energy-based models =====
154-
155-
See discussions in Foster, chapter 7 on energy-based models at URL:"https://github.com/davidADSP/Generative_Deep_Learning_2nd_Edition/tree/main/notebooks/07_ebm/01_ebm"
156-
157-
That notebook is based on a recent article by Du and Mordatch, _Implicit generation and modeling with energy-based models_, see URL:"https://arxiv.org/pdf/1903.08689.pdf."
158-
159-
!split
160-
===== Langevin sampling =====
161-
162-
Also called Stochastic gradient Langevin dynamics (SGLD), is sampling
163-
technique composed of characteristics from Stochastic gradient descent
164-
(SGD) and Langevin dynamics, a mathematical extension of the Langevin
165-
equation. The SGLD is an iterative
166-
optimization algorithm which uses minibatching to create a stochastic
167-
gradient estimator, as used in SGD to optimize a differentiable
168-
objective function.
169-
170-
Unlike traditional SGD, SGLD can be used for
171-
Bayesian learning as a sampling method. SGLD may be viewed as Langevin
172-
dynamics applied to posterior distributions, but the key difference is
173-
that the likelihood gradient terms are minibatched, like in SGD. SGLD,
174-
like Langevin dynamics, produces samples from a posterior distribution
175-
of parameters based on available data.
176-
177-
!split
178-
===== More on the SGLD =====
179-
180-
The SGLD uses the probability $p(\theta)$ (note that we limit
181-
ourselves to just a variable $\theta$) and updates the _log_ of this
182-
probability by initializing it through some random prior distribution,
183-
normally just a uniform distribution which takes values between
184-
$\theta\in [-1,1]$,
185-
186-
The update is given by
187-
!bt
188-
\[
189-
\theta_{i+1}=\theta_{i}+\eta \nabla_{\theta} \log{p(\theta_{i})}+\sqrt{\eta}w_i,
190-
\]
191-
!et
192-
where $w_i\sim N(0,1)$ are normally distributed with mean zero and
193-
variance one and $i=0,1,\dots,k$, with $k$ the final number of
194-
iterations. The parameter $\eta$ is the learning rate. The term
195-
$\sqrt{\eta}w_i$ introduces _noise_ in the equation.
196-
197-
!split
198-
===== Code example of Langevin Sampling =====
199-
200-
In our calculations the gradient is calculated using the model we have
201-
for the probability distribution. For an energy-based model this gives
202-
us a derivative which involves the so-called positive and negative
203-
phases discussed last week.
204-
205-
Read more about Langevin sampling at for example
206-
URL:"https://www.lyndonduong.com/sgmcmc/". This site contains a nice
207-
example of a PyTorch code which implements Langevin sampling.
208-
20925
!split
21026
===== Theory of Variational Autoencoders =====
21127

212-
Let us remind ourself about what an autoencoder is, see the jupyter-notebook at URL:"https://github.com/CompPhysics/AdvancedMachineLearning/blob/main/doc/pub/week10/ipynb/week10.ipynb".
213-
214-
215-
!split
216-
===== The Autoencoder again =====
217-
218-
219-
Autoencoders are neural networks where the outputs are its own
220-
inputs. They are split into an _encoder part_
221-
which maps the input $\bm{x}$ via a function $f(\bm{x},\bm{W})$ (this
222-
is the encoder part) to a _so-called code part_ (or intermediate part)
223-
with the result $\bm{h}$
224-
225-
!bt
226-
\[
227-
\bm{h} = f(\bm{x},\bm{W})),
228-
\]
229-
!et
230-
where $\bm{W}$ are the weights to be determined. The _decoder_ parts maps, via its own parameters (weights given by the matrix $\bm{V}$ and its own biases) to
231-
the final ouput
232-
!bt
233-
\[
234-
\tilde{\bm{x}} = g(\bm{h},\bm{V})).
235-
\]
236-
!et
237-
238-
The goal is to minimize the construction error, often done by optimizing the means squared error.
239-
240-
!split
241-
===== Schematic image of an Autoencoder =====
242-
243-
FIGURE: [figures/ae1.png, width=700 frac=1.0]
244-
24528

24629
!split
24730
===== Mathematics of Variational Autoencoders =====

0 commit comments

Comments
 (0)