|
1 | 1 | TITLE: Advanced machine learning and data analysis for the physical sciences |
2 | | -AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics and Center for Computing in Science Education, University of Oslo, Norway & Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University, East Lansing, Michigan, USA |
3 | | -DATE: April 16, 2024 |
| 2 | +AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics and Center for Computing in Science Education, University of Oslo, Norway |
| 3 | +DATE: April 24, 2025 |
4 | 4 |
|
5 | 5 |
|
6 | 6 | !split |
7 | | -===== Plans for the week April 15-19, 2024 ===== |
| 7 | +===== Plans for the week April 21-25, 2025 ===== |
8 | 8 |
|
9 | 9 | !bblock Deep generative models |
10 | | -o Finalizing discussion of Boltzmann machines, implementations using TensorFlow and Pytorch |
11 | | -o Discussion of other energy-based models and Langevin sampling |
12 | | -o Variational Autoencoders (VAE), mathematics |
13 | | -o "Video of lecture":"https://youtu.be/rw-NBN293o4" |
14 | | -o "Whiteboard notes":"https://github.com/CompPhysics/AdvancedMachineLearning/blob/main/doc/HandwrittenNotes/2024/NotesApril16.pdf" |
| 10 | +o Variational Autoencoders (VAE), mathematics, basic mathematics |
| 11 | +o Writing our own codes for VAEs |
| 12 | +#o "Video of lecture":"https://youtu.be/rw-NBN293o4" |
| 13 | +#o "Whiteboard notes":"https://github.com/CompPhysics/AdvancedMachineLearning/blob/main/doc/HandwrittenNotes/2024/NotesApril16.pdf" |
15 | 14 | !eblock |
16 | 15 |
|
17 | 16 | !split |
18 | 17 | ===== Readings ===== |
19 | 18 | !bblock |
20 | | -o Reading recommendation: Goodfellow et al, for VAEs see sections 20.10-20.11 |
21 | | -o To create Boltzmann machine using Keras, see Babcock and Bali chapter 4, see URL:"https://github.com/PacktPublishing/Hands-On-Generative-AI-with-Python-and-TensorFlow-2/blob/master/Chapter_4/models/rbm.py" |
22 | | -o See Foster, chapter 7 on energy-based models at URL:"https://github.com/davidADSP/Generative_Deep_Learning_2nd_Edition/tree/main/notebooks/07_ebm/01_ebm" |
| 19 | +o Add VAE material |
23 | 20 | !eblock |
24 | 21 |
|
25 | 22 | #todo: add about Langevin sampling, see https://www.lyndonduong.com/sgmcmc/ |
26 | 23 | # code for VAEs applied to MNIST and CIFAR perhaps |
27 | 24 |
|
28 | | -!split |
29 | | -===== Reminder from last week and layout of lecture this week ===== |
30 | | - |
31 | | -o We will present first a reminder from last week, see for example the jupyter-notebook at URL:"https://github.com/CompPhysics/AdvancedMachineLearning/blob/main/doc/pub/week12/ipynb/week12.ipynb" |
32 | | -o We will then discuss codes as well as other energy-based models and Langevin sampling instead of Gibbs or Metropolis sampling. |
33 | | -o Thereafter we start our discussions of Variational autoencoders and Generalized adversarial networks |
34 | | - |
35 | | -!split |
36 | | -===== Code for RBMs using PyTorch for a binary-binary RBM ===== |
37 | | - |
38 | | -!bc pycod |
39 | | -import numpy as np |
40 | | -import torch |
41 | | -import torch.utils.data |
42 | | -import torch.nn as nn |
43 | | -import torch.nn.functional as F |
44 | | -import torch.optim as optim |
45 | | -from torch.autograd import Variable |
46 | | -from torchvision import datasets, transforms |
47 | | -from torchvision.utils import make_grid , save_image |
48 | | -import matplotlib.pyplot as plt |
49 | | - |
50 | | - |
51 | | -batch_size = 64 |
52 | | -train_loader = torch.utils.data.DataLoader( |
53 | | -datasets.MNIST('./data', |
54 | | - train=True, |
55 | | - download = True, |
56 | | - transform = transforms.Compose( |
57 | | - [transforms.ToTensor()]) |
58 | | - ), |
59 | | - batch_size=batch_size |
60 | | -) |
61 | | - |
62 | | -test_loader = torch.utils.data.DataLoader( |
63 | | -datasets.MNIST('./data', |
64 | | - train=False, |
65 | | - transform=transforms.Compose( |
66 | | - [transforms.ToTensor()]) |
67 | | - ), |
68 | | - batch_size=batch_size) |
69 | | - |
70 | | - |
71 | | -class RBM(nn.Module): |
72 | | - def __init__(self, |
73 | | - n_vis=784, |
74 | | - n_hin=500, |
75 | | - k=5): |
76 | | - super(RBM, self).__init__() |
77 | | - self.W = nn.Parameter(torch.randn(n_hin,n_vis)*1e-2) |
78 | | - self.v_bias = nn.Parameter(torch.zeros(n_vis)) |
79 | | - self.h_bias = nn.Parameter(torch.zeros(n_hin)) |
80 | | - self.k = k |
81 | | - |
82 | | - def sample_from_p(self,p): |
83 | | - return F.relu(torch.sign(p - Variable(torch.rand(p.size())))) |
84 | | - |
85 | | - def v_to_h(self,v): |
86 | | - p_h = F.sigmoid(F.linear(v,self.W,self.h_bias)) |
87 | | - sample_h = self.sample_from_p(p_h) |
88 | | - return p_h,sample_h |
89 | | - |
90 | | - def h_to_v(self,h): |
91 | | - p_v = F.sigmoid(F.linear(h,self.W.t(),self.v_bias)) |
92 | | - sample_v = self.sample_from_p(p_v) |
93 | | - return p_v,sample_v |
94 | | - |
95 | | - def forward(self,v): |
96 | | - pre_h1,h1 = self.v_to_h(v) |
97 | | - |
98 | | - h_ = h1 |
99 | | - for _ in range(self.k): |
100 | | - pre_v_,v_ = self.h_to_v(h_) |
101 | | - pre_h_,h_ = self.v_to_h(v_) |
102 | | - |
103 | | - return v,v_ |
104 | | - |
105 | | - def free_energy(self,v): |
106 | | - vbias_term = v.mv(self.v_bias) |
107 | | - wx_b = F.linear(v,self.W,self.h_bias) |
108 | | - hidden_term = wx_b.exp().add(1).log().sum(1) |
109 | | - return (-hidden_term - vbias_term).mean() |
110 | | - |
111 | | - |
112 | | - |
113 | | - |
114 | | -rbm = RBM(k=1) |
115 | | -train_op = optim.SGD(rbm.parameters(),0.1) |
116 | | - |
117 | | -for epoch in range(10): |
118 | | - loss_ = [] |
119 | | - for _, (data,target) in enumerate(train_loader): |
120 | | - data = Variable(data.view(-1,784)) |
121 | | - sample_data = data.bernoulli() |
122 | | - |
123 | | - v,v1 = rbm(sample_data) |
124 | | - loss = rbm.free_energy(v) - rbm.free_energy(v1) |
125 | | - loss_.append(loss.data) |
126 | | - train_op.zero_grad() |
127 | | - loss.backward() |
128 | | - train_op.step() |
129 | | - |
130 | | - print("Training loss for {} epoch: {}".format(epoch, np.mean(loss_))) |
131 | | - |
132 | | - |
133 | | -def show_adn_save(file_name,img): |
134 | | - npimg = np.transpose(img.numpy(),(1,2,0)) |
135 | | - f = "./%s.png" % file_name |
136 | | - plt.imshow(npimg) |
137 | | - plt.imsave(f,npimg) |
138 | | - |
139 | | -show_adn_save("real",make_grid(v.view(32,1,28,28).data)) |
140 | | -show_adn_save("generate",make_grid(v1.view(32,1,28,28).data)) |
141 | | - |
142 | | -!ec |
143 | | - |
144 | | -!split |
145 | | -===== RBM using TensorFlow and Keras ===== |
146 | | - |
147 | | - |
148 | | -o To create Boltzmann machine using Keras, see Babcock and Bali chapter 4, see URL:"https://github.com/PacktPublishing/Hands-On-Generative-AI-with-Python-and-TensorFlow-2/blob/master/Chapter_4/models/rbm.py" |
149 | | - |
150 | | - |
151 | | - |
152 | | -!split |
153 | | -===== Codes for Energy-based models ===== |
154 | | - |
155 | | -See discussions in Foster, chapter 7 on energy-based models at URL:"https://github.com/davidADSP/Generative_Deep_Learning_2nd_Edition/tree/main/notebooks/07_ebm/01_ebm" |
156 | | - |
157 | | -That notebook is based on a recent article by Du and Mordatch, _Implicit generation and modeling with energy-based models_, see URL:"https://arxiv.org/pdf/1903.08689.pdf." |
158 | | - |
159 | | -!split |
160 | | -===== Langevin sampling ===== |
161 | | - |
162 | | -Also called Stochastic gradient Langevin dynamics (SGLD), is sampling |
163 | | -technique composed of characteristics from Stochastic gradient descent |
164 | | -(SGD) and Langevin dynamics, a mathematical extension of the Langevin |
165 | | -equation. The SGLD is an iterative |
166 | | -optimization algorithm which uses minibatching to create a stochastic |
167 | | -gradient estimator, as used in SGD to optimize a differentiable |
168 | | -objective function. |
169 | | - |
170 | | -Unlike traditional SGD, SGLD can be used for |
171 | | -Bayesian learning as a sampling method. SGLD may be viewed as Langevin |
172 | | -dynamics applied to posterior distributions, but the key difference is |
173 | | -that the likelihood gradient terms are minibatched, like in SGD. SGLD, |
174 | | -like Langevin dynamics, produces samples from a posterior distribution |
175 | | -of parameters based on available data. |
176 | | - |
177 | | -!split |
178 | | -===== More on the SGLD ===== |
179 | | - |
180 | | -The SGLD uses the probability $p(\theta)$ (note that we limit |
181 | | -ourselves to just a variable $\theta$) and updates the _log_ of this |
182 | | -probability by initializing it through some random prior distribution, |
183 | | -normally just a uniform distribution which takes values between |
184 | | -$\theta\in [-1,1]$, |
185 | | - |
186 | | -The update is given by |
187 | | -!bt |
188 | | -\[ |
189 | | -\theta_{i+1}=\theta_{i}+\eta \nabla_{\theta} \log{p(\theta_{i})}+\sqrt{\eta}w_i, |
190 | | -\] |
191 | | -!et |
192 | | -where $w_i\sim N(0,1)$ are normally distributed with mean zero and |
193 | | -variance one and $i=0,1,\dots,k$, with $k$ the final number of |
194 | | -iterations. The parameter $\eta$ is the learning rate. The term |
195 | | -$\sqrt{\eta}w_i$ introduces _noise_ in the equation. |
196 | | - |
197 | | -!split |
198 | | -===== Code example of Langevin Sampling ===== |
199 | | - |
200 | | -In our calculations the gradient is calculated using the model we have |
201 | | -for the probability distribution. For an energy-based model this gives |
202 | | -us a derivative which involves the so-called positive and negative |
203 | | -phases discussed last week. |
204 | | - |
205 | | -Read more about Langevin sampling at for example |
206 | | -URL:"https://www.lyndonduong.com/sgmcmc/". This site contains a nice |
207 | | -example of a PyTorch code which implements Langevin sampling. |
208 | | - |
209 | 25 | !split |
210 | 26 | ===== Theory of Variational Autoencoders ===== |
211 | 27 |
|
212 | | -Let us remind ourself about what an autoencoder is, see the jupyter-notebook at URL:"https://github.com/CompPhysics/AdvancedMachineLearning/blob/main/doc/pub/week10/ipynb/week10.ipynb". |
213 | | - |
214 | | - |
215 | | -!split |
216 | | -===== The Autoencoder again ===== |
217 | | - |
218 | | - |
219 | | -Autoencoders are neural networks where the outputs are its own |
220 | | -inputs. They are split into an _encoder part_ |
221 | | -which maps the input $\bm{x}$ via a function $f(\bm{x},\bm{W})$ (this |
222 | | -is the encoder part) to a _so-called code part_ (or intermediate part) |
223 | | -with the result $\bm{h}$ |
224 | | - |
225 | | -!bt |
226 | | -\[ |
227 | | -\bm{h} = f(\bm{x},\bm{W})), |
228 | | -\] |
229 | | -!et |
230 | | -where $\bm{W}$ are the weights to be determined. The _decoder_ parts maps, via its own parameters (weights given by the matrix $\bm{V}$ and its own biases) to |
231 | | -the final ouput |
232 | | -!bt |
233 | | -\[ |
234 | | -\tilde{\bm{x}} = g(\bm{h},\bm{V})). |
235 | | -\] |
236 | | -!et |
237 | | - |
238 | | -The goal is to minimize the construction error, often done by optimizing the means squared error. |
239 | | - |
240 | | -!split |
241 | | -===== Schematic image of an Autoencoder ===== |
242 | | - |
243 | | -FIGURE: [figures/ae1.png, width=700 frac=1.0] |
244 | | - |
245 | 28 |
|
246 | 29 | !split |
247 | 30 | ===== Mathematics of Variational Autoencoders ===== |
|
0 commit comments