Skip to content

Commit 24234ed

Browse files
committed
added rbms
1 parent 297e3be commit 24234ed

2 files changed

Lines changed: 452 additions & 0 deletions

File tree

Lines changed: 247 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,247 @@
1+
2+
Restricted Boltzmann Machine (RBM) Implementation
3+
4+
5+
6+
Overview
7+
8+
9+
A Restricted Boltzmann Machine (RBM) is a two-layer generative neural network with a layer of visible units and a layer of hidden units. In an RBM, each visible unit is connected to each hidden unit (forming a bipartite graph), and there are no connections among units within the same layer . This implementation focuses on a binary-binary RBM (visible and hidden units are 0/1). We will use object-oriented design to encapsulate the RBM behavior in a class. Key features of the implementation include:
10+
11+
Configurable layer sizes: The number of visible and hidden units are set during initialization.
12+
Binary units: Both visible and hidden units are Bernoulli (binary 0 or 1) random variables.
13+
Gibbs Sampling Methods: Functions to sample hidden states given visible units and vice versa, enabling Gibbs sampling in the network.
14+
Training via Contrastive Divergence (CD-k): Uses Gibbs sampling (typically with k=1) to perform one-step reconstruction and update weights to learn the data distribution.
15+
Output methods: Functions to retrieve the learned weight matrix and biases, to reconstruct visible data from hidden representations, and to generate new samples from the trained model.
16+
17+
18+
Training with Contrastive Divergence (CD): In each training iteration, the algorithm performs the following steps:
19+
20+
Positive phase: Compute hidden probabilities given the data, and sample hidden states.
21+
Negative phase (reconstruction): Starting from those hidden states, sample a reconstruction of the visible units, then recompute hidden probabilities from this reconstruction (this completes one Gibbs sampling step, i.e. CD-1). For CD-k, this visible↦hidden sampling is repeated k times.
22+
Weight update: Adjust weights and biases using the difference between the observed data correlations (visible × hidden from the positive phase) and the model’s reconstructed correlations (visible × hidden from the negative phase) . This gradient update moves the model toward producing reconstructions that better match the input data.
23+
24+
25+
Below is the RBM class implementation in Python, with detailed comments explaining each part:
26+
27+
28+
RBM Class Implementation
29+
30+
import numpy as np
31+
32+
class RBM:
33+
def __init__(self, n_visible, n_hidden, learning_rate=0.1):
34+
"""
35+
Initialize the RBM with given number of visible and hidden units.
36+
Weights are initialized to small random values, biases to zero.
37+
"""
38+
self.n_visible = n_visible # number of visible units
39+
self.n_hidden = n_hidden # number of hidden units
40+
self.learning_rate = learning_rate
41+
42+
# Initialize weight matrix (dimensions: n_visible x n_hidden) with small random values
43+
# Using a normal distribution with mean 0 and standard deviation 0.1 for initialization
44+
self.weights = np.random.normal(loc=0.0, scale=0.1, size=(n_visible, n_hidden))
45+
# Initialize biases for visible and hidden units as zero vectors
46+
self.visible_bias = np.zeros(n_visible)
47+
self.hidden_bias = np.zeros(n_hidden)
48+
49+
def _sigmoid(self, x):
50+
"""Sigmoid activation function to convert energies to probabilities (0 to 1)."""
51+
return 1.0 / (1.0 + np.exp(-x))
52+
53+
def sample_hidden(self, visible):
54+
"""
55+
Sample hidden state given a visible state.
56+
- visible: array-like of shape (n_visible,) with binary values (0 or 1).
57+
- Returns: numpy array of shape (n_hidden,) with sampled binary hidden states.
58+
"""
59+
# Ensure input is a numpy array
60+
visible = np.array(visible)
61+
# Compute activations of hidden units: W^T * visible + hidden_bias
62+
hidden_activations = visible.dot(self.weights) + self.hidden_bias # shape (n_hidden,)
63+
# Convert activations to probabilities via sigmoid
64+
hidden_probs = self._sigmoid(hidden_activations)
65+
# Sample each hidden unit (1 with probability = hidden_probs)
66+
hidden_states = (np.random.rand(self.n_hidden) < hidden_probs).astype(np.int_)
67+
return hidden_states
68+
69+
def sample_visible(self, hidden):
70+
"""
71+
Sample visible state given a hidden state.
72+
- hidden: array-like of shape (n_hidden,) with binary values (0 or 1).
73+
- Returns: numpy array of shape (n_visible,) with sampled binary visible states.
74+
"""
75+
hidden = np.array(hidden)
76+
# Compute activations of visible units: W * hidden + visible_bias
77+
visible_activations = hidden.dot(self.weights.T) + self.visible_bias # shape (n_visible,)
78+
# Convert activations to probabilities via sigmoid
79+
visible_probs = self._sigmoid(visible_activations)
80+
# Sample each visible unit (1 with probability = visible_probs)
81+
visible_states = (np.random.rand(self.n_visible) < visible_probs).astype(np.int_)
82+
return visible_states
83+
84+
def train(self, data, epochs=10, batch_size=1, k=1):
85+
"""
86+
Train the RBM using Contrastive Divergence (CD-k).
87+
88+
Parameters:
89+
- data: array-like of shape (num_samples, n_visible), containing the training data (values 0 or 1).
90+
- epochs: number of epochs (full passes through the dataset) to train.
91+
- batch_size: number of samples per mini-batch for updates (default 1 for stochastic CD).
92+
- k: number of Gibbs sampling steps for CD-k (default 1, which is the standard CD-1).
93+
"""
94+
data = np.array(data)
95+
num_samples = data.shape[0]
96+
batch_size = min(batch_size, num_samples) # ensure batch_size is valid
97+
98+
for epoch in range(epochs):
99+
# Shuffle the training data at the start of each epoch
100+
indices = np.random.permutation(num_samples)
101+
data_shuffled = data[indices]
102+
103+
# Loop over batches of the training data
104+
for i in range(0, num_samples, batch_size):
105+
batch = data_shuffled[i:i+batch_size]
106+
107+
# === Positive phase ===
108+
# Compute hidden probabilities for the batch (forward pass)
109+
pos_hidden_activations = batch.dot(self.weights) + self.hidden_bias # shape: (batch_size, n_hidden)
110+
pos_hidden_probs = self._sigmoid(pos_hidden_activations) # p(h_j=1 | v)
111+
# Sample hidden states from these probabilities
112+
pos_hidden_states = (np.random.rand(batch.shape[0], self.n_hidden) < pos_hidden_probs).astype(np.int_)
113+
# Compute positive associations (correlation between v and h) as batch.T * pos_hidden_probs
114+
pos_associations = batch.T.dot(pos_hidden_probs) # shape: (n_visible, n_hidden)
115+
116+
# === Negative phase (reconstruction) ===
117+
# Initialize the negative phase using the sampled hidden states from the positive phase
118+
neg_hidden_states = pos_hidden_states
119+
neg_hidden_probs = None
120+
neg_visible_states = None
121+
122+
# Perform k full Gibbs sampling steps (hidden->visible->hidden repeated k times)
123+
for step in range(k):
124+
# Visible probabilities given hidden
125+
neg_visible_activations = neg_hidden_states.dot(self.weights.T) + self.visible_bias # (batch_size, n_visible)
126+
neg_visible_probs = self._sigmoid(neg_visible_activations) # p(v_i=1 | h)
127+
# Sample visible states from these probabilities
128+
neg_visible_states = (np.random.rand(batch.shape[0], self.n_visible) < neg_visible_probs).astype(np.int_)
129+
130+
# Hidden probabilities given the reconstructed visible
131+
neg_hidden_activations = neg_visible_states.dot(self.weights) + self.hidden_bias # (batch_size, n_hidden)
132+
neg_hidden_probs = self._sigmoid(neg_hidden_activations) # p(h_j=1 | v_recon)
133+
# Sample hidden states for the next step (if more Gibbs steps remain)
134+
if step < k - 1:
135+
neg_hidden_states = (np.random.rand(batch.shape[0], self.n_hidden) < neg_hidden_probs).astype(np.int_)
136+
137+
# After k steps, we have neg_visible_states (sampled reconstruction) and neg_hidden_probs (probabilities at hidden layer)
138+
139+
# === Update weights and biases ===
140+
# Compute negative associations (using reconstructed visible states and hidden probabilities)
141+
neg_associations = neg_visible_states.T.dot(neg_hidden_probs) # shape: (n_visible, n_hidden)
142+
# Update weights: increase by learning_rate * (positive associations - negative associations) normalized by batch size
143+
self.weights += self.learning_rate * (pos_associations - neg_associations) / batch.shape[0]
144+
# Update biases: adjust towards the data (positive phase) and away from reconstructions (negative phase)
145+
self.visible_bias += self.learning_rate * np.mean(batch - neg_visible_states, axis=0)
146+
self.hidden_bias += self.learning_rate * np.mean(pos_hidden_probs - neg_hidden_probs, axis=0)
147+
# (Optionally, one could compute reconstruction error here to monitor training progress)
148+
149+
def reconstruct(self, visible):
150+
"""
151+
Reconstruct a given visible vector through one forward-pass to hidden and one backward-pass to visible.
152+
- visible: array-like of shape (n_visible,) with binary values.
153+
- Returns: tuple (reconstructed_visible_probabilities, reconstructed_visible_state).
154+
"""
155+
visible = np.array(visible)
156+
# Forward pass: visible -> hidden
157+
hidden_probs = self._sigmoid(visible.dot(self.weights) + self.hidden_bias)
158+
hidden_state = (np.random.rand(self.n_hidden) < hidden_probs).astype(np.int_)
159+
# Backward pass: hidden -> visible
160+
recon_visible_activations = hidden_state.dot(self.weights.T) + self.visible_bias
161+
recon_visible_probs = self._sigmoid(recon_visible_activations)
162+
recon_visible_state = (np.random.rand(self.n_visible) < recon_visible_probs).astype(np.int_)
163+
return recon_visible_probs, recon_visible_state
164+
165+
def get_parameters(self):
166+
"""
167+
Get the learned parameters of the RBM.
168+
Returns a tuple: (weight_matrix, visible_bias_vector, hidden_bias_vector).
169+
"""
170+
return self.weights, self.visible_bias, self.hidden_bias
171+
172+
def gibbs_sample(self, num_steps=1, initial_visible=None):
173+
"""
174+
Generate a sample from the learned RBM by running Gibbs sampling.
175+
- num_steps: number of Gibbs sampling iterations to perform.
176+
- initial_visible: optional initial visible state to start the chain (if None, a random visible state is used).
177+
- Returns: tuple (visible_state, hidden_state) after the Gibbs sampling steps.
178+
"""
179+
# If no initial visible state provided, start with a random binary visible vector
180+
if initial_visible is None:
181+
visible_state = (np.random.rand(self.n_visible) < 0.5).astype(np.int_)
182+
else:
183+
visible_state = np.array(initial_visible).astype(np.int_)
184+
185+
hidden_state = None
186+
# Iterate Gibbs sampling for the specified number of steps
187+
for step in range(num_steps):
188+
# Sample hidden state given current visible state
189+
hidden_probs = self._sigmoid(visible_state.dot(self.weights) + self.hidden_bias)
190+
hidden_state = (np.random.rand(self.n_hidden) < hidden_probs).astype(np.int_)
191+
# Sample visible state given current hidden state
192+
visible_probs = self._sigmoid(hidden_state.dot(self.weights.T) + self.visible_bias)
193+
visible_state = (np.random.rand(self.n_visible) < visible_probs).astype(np.int_)
194+
# After the final step, we have a new visible_state; sample a final hidden_state for completeness
195+
hidden_probs = self._sigmoid(visible_state.dot(self.weights) + self.hidden_bias)
196+
hidden_state = (np.random.rand(self.n_hidden) < hidden_probs).astype(np.int_)
197+
return visible_state, hidden_state
198+
Explanation: The RBM class encapsulates the entire RBM functionality:
199+
200+
sample_hidden(visible) and sample_visible(hidden) perform one-step Gibbs sampling, drawing a binary sample of the hidden (or visible) layer given the opposite layer.
201+
train(data, epochs, batch_size, k) implements the Contrastive Divergence training procedure (CD-k). For each mini-batch, it runs a positive phase to get pos_hidden_probs, then runs a Gibbs chain of length k to produce a reconstructed neg_visible_states and corresponding neg_hidden_probs. The weights and biases are updated using the difference between positive associations (v * h from data) and negative associations (v * h from reconstruction) . Biases are adjusted by the difference between the data and reconstructed averages.
202+
reconstruct(visible) performs a one-step reconstruction: it returns the probabilities of the visible units after a round-trip (visible -> hidden -> visible) and also a sampled binary reconstruction.
203+
get_parameters() simply returns the learned weights and biases for inspection or use elsewhere.
204+
gibbs_sample(num_steps, initial_visible) allows generating a sample from the model by running a Gibbs sampler for a given number of steps. This can be used after training to sample new visible configurations (and their hidden states) from the learned distribution.
205+
206+
207+
208+
Example Usage
209+
210+
211+
Below is an example demonstrating how to use the RBM class. We create a small synthetic dataset of binary patterns, train the RBM on it, and then show the learned parameters, reconstruction of an input, and a new sample generated from the model:
212+
# Example usage of the RBM class
213+
214+
# 1. Create a synthetic dataset (binary patterns)
215+
# Here we use patterns of length 4. For example:
216+
# Pattern A = [1, 1, 0, 0] and Pattern B = [0, 0, 1, 1], repeated multiple times.
217+
data = [[1, 1, 0, 0]] * 50 + [[0, 0, 1, 1]] * 50 # 100 samples total (50 of each pattern)
218+
219+
# 2. Initialize the RBM with 4 visible units and 2 hidden units
220+
rbm = RBM(n_visible=4, n_hidden=2, learning_rate=0.1)
221+
222+
# 3. Train the RBM on the dataset for a certain number of epochs
223+
rbm.train(data, epochs=50, batch_size=10, k=1) # using CD-1 (k=1) and mini-batches of size 10
224+
225+
# 4. After training, retrieve the learned weights and biases
226+
weights, vis_bias, hid_bias = rbm.get_parameters()
227+
print("Learned weight matrix:\n", weights)
228+
print("Learned visible biases:", vis_bias)
229+
print("Learned hidden biases:", hid_bias)
230+
231+
# 5. Reconstruct an example from the training data using the trained RBM
232+
test_visible = np.array([1, 1, 0, 0]) # input pattern A
233+
recon_probs, recon_state = rbm.reconstruct(test_visible)
234+
print("\nOriginal visible input:", test_visible)
235+
print("Reconstructed visible probabilities:", recon_probs)
236+
print("Reconstructed visible sample:", recon_state)
237+
238+
# 6. Generate a new sample from the model using Gibbs sampling
239+
visible_sample, hidden_sample = rbm.gibbs_sample(num_steps=5) # start from a random visible state
240+
print("\nNew sampled visible state (after Gibbs sampling):", visible_sample)
241+
print("Corresponding hidden state:", hidden_sample)
242+
Output Explained: After sufficient training, the RBM’s weights and biases should reflect the correlations in the data. In this example, the model learns that the first two visible units are usually 1 while the last two are 0 (Pattern A), or vice versa for Pattern B. A possible learned weight matrix (4 visible × 2 hidden) might look like:
243+
[[-4.2, 3.8], # strong negative connection from hidden1 to vis1, strong positive to hidden2
244+
[-4.2, 3.9], # vis2 similar to vis1
245+
[ 4.2, -3.7], # vis3 has strong positive connection to hidden1, negative to hidden2
246+
[ 4.2, -3.8]] # vis4 similar to vis3
247+
Such weights indicate that one hidden unit (hidden1) is activated by the pattern [0,0,1,1] and the other (hidden2) by [1,1,0,0]. The reconstruction for an input like [1,1,0,0] yields high probabilities (~0.95) for the first two units being 1 and low probabilities (~0.05) for the last two being 1, meaning the RBM correctly reconstructs the original pattern. Conversely, an input of [0,0,1,1] would be reconstructed with the last two units highly likely to be 1 . Using gibbs_sample, we can obtain a random visible state from the model; in this trained example, the sampled visible state will tend to be either [1,1,0,0] or [0,0,1,1] (or a close variant), reflecting the patterns the RBM has learned.

0 commit comments

Comments
 (0)