|
| 1 | + |
| 2 | +Restricted Boltzmann Machine (RBM) Implementation |
| 3 | + |
| 4 | + |
| 5 | + |
| 6 | +Overview |
| 7 | + |
| 8 | + |
| 9 | +A Restricted Boltzmann Machine (RBM) is a two-layer generative neural network with a layer of visible units and a layer of hidden units. In an RBM, each visible unit is connected to each hidden unit (forming a bipartite graph), and there are no connections among units within the same layer . This implementation focuses on a binary-binary RBM (visible and hidden units are 0/1). We will use object-oriented design to encapsulate the RBM behavior in a class. Key features of the implementation include: |
| 10 | + |
| 11 | +Configurable layer sizes: The number of visible and hidden units are set during initialization. |
| 12 | +Binary units: Both visible and hidden units are Bernoulli (binary 0 or 1) random variables. |
| 13 | +Gibbs Sampling Methods: Functions to sample hidden states given visible units and vice versa, enabling Gibbs sampling in the network. |
| 14 | +Training via Contrastive Divergence (CD-k): Uses Gibbs sampling (typically with k=1) to perform one-step reconstruction and update weights to learn the data distribution. |
| 15 | +Output methods: Functions to retrieve the learned weight matrix and biases, to reconstruct visible data from hidden representations, and to generate new samples from the trained model. |
| 16 | + |
| 17 | + |
| 18 | +Training with Contrastive Divergence (CD): In each training iteration, the algorithm performs the following steps: |
| 19 | + |
| 20 | +Positive phase: Compute hidden probabilities given the data, and sample hidden states. |
| 21 | +Negative phase (reconstruction): Starting from those hidden states, sample a reconstruction of the visible units, then recompute hidden probabilities from this reconstruction (this completes one Gibbs sampling step, i.e. CD-1). For CD-k, this visible↦hidden sampling is repeated k times. |
| 22 | +Weight update: Adjust weights and biases using the difference between the observed data correlations (visible × hidden from the positive phase) and the model’s reconstructed correlations (visible × hidden from the negative phase) . This gradient update moves the model toward producing reconstructions that better match the input data. |
| 23 | + |
| 24 | + |
| 25 | +Below is the RBM class implementation in Python, with detailed comments explaining each part: |
| 26 | + |
| 27 | + |
| 28 | +RBM Class Implementation |
| 29 | + |
| 30 | +import numpy as np |
| 31 | + |
| 32 | +class RBM: |
| 33 | + def __init__(self, n_visible, n_hidden, learning_rate=0.1): |
| 34 | + """ |
| 35 | + Initialize the RBM with given number of visible and hidden units. |
| 36 | + Weights are initialized to small random values, biases to zero. |
| 37 | + """ |
| 38 | + self.n_visible = n_visible # number of visible units |
| 39 | + self.n_hidden = n_hidden # number of hidden units |
| 40 | + self.learning_rate = learning_rate |
| 41 | + |
| 42 | + # Initialize weight matrix (dimensions: n_visible x n_hidden) with small random values |
| 43 | + # Using a normal distribution with mean 0 and standard deviation 0.1 for initialization |
| 44 | + self.weights = np.random.normal(loc=0.0, scale=0.1, size=(n_visible, n_hidden)) |
| 45 | + # Initialize biases for visible and hidden units as zero vectors |
| 46 | + self.visible_bias = np.zeros(n_visible) |
| 47 | + self.hidden_bias = np.zeros(n_hidden) |
| 48 | + |
| 49 | + def _sigmoid(self, x): |
| 50 | + """Sigmoid activation function to convert energies to probabilities (0 to 1).""" |
| 51 | + return 1.0 / (1.0 + np.exp(-x)) |
| 52 | + |
| 53 | + def sample_hidden(self, visible): |
| 54 | + """ |
| 55 | + Sample hidden state given a visible state. |
| 56 | + - visible: array-like of shape (n_visible,) with binary values (0 or 1). |
| 57 | + - Returns: numpy array of shape (n_hidden,) with sampled binary hidden states. |
| 58 | + """ |
| 59 | + # Ensure input is a numpy array |
| 60 | + visible = np.array(visible) |
| 61 | + # Compute activations of hidden units: W^T * visible + hidden_bias |
| 62 | + hidden_activations = visible.dot(self.weights) + self.hidden_bias # shape (n_hidden,) |
| 63 | + # Convert activations to probabilities via sigmoid |
| 64 | + hidden_probs = self._sigmoid(hidden_activations) |
| 65 | + # Sample each hidden unit (1 with probability = hidden_probs) |
| 66 | + hidden_states = (np.random.rand(self.n_hidden) < hidden_probs).astype(np.int_) |
| 67 | + return hidden_states |
| 68 | + |
| 69 | + def sample_visible(self, hidden): |
| 70 | + """ |
| 71 | + Sample visible state given a hidden state. |
| 72 | + - hidden: array-like of shape (n_hidden,) with binary values (0 or 1). |
| 73 | + - Returns: numpy array of shape (n_visible,) with sampled binary visible states. |
| 74 | + """ |
| 75 | + hidden = np.array(hidden) |
| 76 | + # Compute activations of visible units: W * hidden + visible_bias |
| 77 | + visible_activations = hidden.dot(self.weights.T) + self.visible_bias # shape (n_visible,) |
| 78 | + # Convert activations to probabilities via sigmoid |
| 79 | + visible_probs = self._sigmoid(visible_activations) |
| 80 | + # Sample each visible unit (1 with probability = visible_probs) |
| 81 | + visible_states = (np.random.rand(self.n_visible) < visible_probs).astype(np.int_) |
| 82 | + return visible_states |
| 83 | + |
| 84 | + def train(self, data, epochs=10, batch_size=1, k=1): |
| 85 | + """ |
| 86 | + Train the RBM using Contrastive Divergence (CD-k). |
| 87 | + |
| 88 | + Parameters: |
| 89 | + - data: array-like of shape (num_samples, n_visible), containing the training data (values 0 or 1). |
| 90 | + - epochs: number of epochs (full passes through the dataset) to train. |
| 91 | + - batch_size: number of samples per mini-batch for updates (default 1 for stochastic CD). |
| 92 | + - k: number of Gibbs sampling steps for CD-k (default 1, which is the standard CD-1). |
| 93 | + """ |
| 94 | + data = np.array(data) |
| 95 | + num_samples = data.shape[0] |
| 96 | + batch_size = min(batch_size, num_samples) # ensure batch_size is valid |
| 97 | + |
| 98 | + for epoch in range(epochs): |
| 99 | + # Shuffle the training data at the start of each epoch |
| 100 | + indices = np.random.permutation(num_samples) |
| 101 | + data_shuffled = data[indices] |
| 102 | + |
| 103 | + # Loop over batches of the training data |
| 104 | + for i in range(0, num_samples, batch_size): |
| 105 | + batch = data_shuffled[i:i+batch_size] |
| 106 | + |
| 107 | + # === Positive phase === |
| 108 | + # Compute hidden probabilities for the batch (forward pass) |
| 109 | + pos_hidden_activations = batch.dot(self.weights) + self.hidden_bias # shape: (batch_size, n_hidden) |
| 110 | + pos_hidden_probs = self._sigmoid(pos_hidden_activations) # p(h_j=1 | v) |
| 111 | + # Sample hidden states from these probabilities |
| 112 | + pos_hidden_states = (np.random.rand(batch.shape[0], self.n_hidden) < pos_hidden_probs).astype(np.int_) |
| 113 | + # Compute positive associations (correlation between v and h) as batch.T * pos_hidden_probs |
| 114 | + pos_associations = batch.T.dot(pos_hidden_probs) # shape: (n_visible, n_hidden) |
| 115 | + |
| 116 | + # === Negative phase (reconstruction) === |
| 117 | + # Initialize the negative phase using the sampled hidden states from the positive phase |
| 118 | + neg_hidden_states = pos_hidden_states |
| 119 | + neg_hidden_probs = None |
| 120 | + neg_visible_states = None |
| 121 | + |
| 122 | + # Perform k full Gibbs sampling steps (hidden->visible->hidden repeated k times) |
| 123 | + for step in range(k): |
| 124 | + # Visible probabilities given hidden |
| 125 | + neg_visible_activations = neg_hidden_states.dot(self.weights.T) + self.visible_bias # (batch_size, n_visible) |
| 126 | + neg_visible_probs = self._sigmoid(neg_visible_activations) # p(v_i=1 | h) |
| 127 | + # Sample visible states from these probabilities |
| 128 | + neg_visible_states = (np.random.rand(batch.shape[0], self.n_visible) < neg_visible_probs).astype(np.int_) |
| 129 | + |
| 130 | + # Hidden probabilities given the reconstructed visible |
| 131 | + neg_hidden_activations = neg_visible_states.dot(self.weights) + self.hidden_bias # (batch_size, n_hidden) |
| 132 | + neg_hidden_probs = self._sigmoid(neg_hidden_activations) # p(h_j=1 | v_recon) |
| 133 | + # Sample hidden states for the next step (if more Gibbs steps remain) |
| 134 | + if step < k - 1: |
| 135 | + neg_hidden_states = (np.random.rand(batch.shape[0], self.n_hidden) < neg_hidden_probs).astype(np.int_) |
| 136 | + |
| 137 | + # After k steps, we have neg_visible_states (sampled reconstruction) and neg_hidden_probs (probabilities at hidden layer) |
| 138 | + |
| 139 | + # === Update weights and biases === |
| 140 | + # Compute negative associations (using reconstructed visible states and hidden probabilities) |
| 141 | + neg_associations = neg_visible_states.T.dot(neg_hidden_probs) # shape: (n_visible, n_hidden) |
| 142 | + # Update weights: increase by learning_rate * (positive associations - negative associations) normalized by batch size |
| 143 | + self.weights += self.learning_rate * (pos_associations - neg_associations) / batch.shape[0] |
| 144 | + # Update biases: adjust towards the data (positive phase) and away from reconstructions (negative phase) |
| 145 | + self.visible_bias += self.learning_rate * np.mean(batch - neg_visible_states, axis=0) |
| 146 | + self.hidden_bias += self.learning_rate * np.mean(pos_hidden_probs - neg_hidden_probs, axis=0) |
| 147 | + # (Optionally, one could compute reconstruction error here to monitor training progress) |
| 148 | + |
| 149 | + def reconstruct(self, visible): |
| 150 | + """ |
| 151 | + Reconstruct a given visible vector through one forward-pass to hidden and one backward-pass to visible. |
| 152 | + - visible: array-like of shape (n_visible,) with binary values. |
| 153 | + - Returns: tuple (reconstructed_visible_probabilities, reconstructed_visible_state). |
| 154 | + """ |
| 155 | + visible = np.array(visible) |
| 156 | + # Forward pass: visible -> hidden |
| 157 | + hidden_probs = self._sigmoid(visible.dot(self.weights) + self.hidden_bias) |
| 158 | + hidden_state = (np.random.rand(self.n_hidden) < hidden_probs).astype(np.int_) |
| 159 | + # Backward pass: hidden -> visible |
| 160 | + recon_visible_activations = hidden_state.dot(self.weights.T) + self.visible_bias |
| 161 | + recon_visible_probs = self._sigmoid(recon_visible_activations) |
| 162 | + recon_visible_state = (np.random.rand(self.n_visible) < recon_visible_probs).astype(np.int_) |
| 163 | + return recon_visible_probs, recon_visible_state |
| 164 | + |
| 165 | + def get_parameters(self): |
| 166 | + """ |
| 167 | + Get the learned parameters of the RBM. |
| 168 | + Returns a tuple: (weight_matrix, visible_bias_vector, hidden_bias_vector). |
| 169 | + """ |
| 170 | + return self.weights, self.visible_bias, self.hidden_bias |
| 171 | + |
| 172 | + def gibbs_sample(self, num_steps=1, initial_visible=None): |
| 173 | + """ |
| 174 | + Generate a sample from the learned RBM by running Gibbs sampling. |
| 175 | + - num_steps: number of Gibbs sampling iterations to perform. |
| 176 | + - initial_visible: optional initial visible state to start the chain (if None, a random visible state is used). |
| 177 | + - Returns: tuple (visible_state, hidden_state) after the Gibbs sampling steps. |
| 178 | + """ |
| 179 | + # If no initial visible state provided, start with a random binary visible vector |
| 180 | + if initial_visible is None: |
| 181 | + visible_state = (np.random.rand(self.n_visible) < 0.5).astype(np.int_) |
| 182 | + else: |
| 183 | + visible_state = np.array(initial_visible).astype(np.int_) |
| 184 | + |
| 185 | + hidden_state = None |
| 186 | + # Iterate Gibbs sampling for the specified number of steps |
| 187 | + for step in range(num_steps): |
| 188 | + # Sample hidden state given current visible state |
| 189 | + hidden_probs = self._sigmoid(visible_state.dot(self.weights) + self.hidden_bias) |
| 190 | + hidden_state = (np.random.rand(self.n_hidden) < hidden_probs).astype(np.int_) |
| 191 | + # Sample visible state given current hidden state |
| 192 | + visible_probs = self._sigmoid(hidden_state.dot(self.weights.T) + self.visible_bias) |
| 193 | + visible_state = (np.random.rand(self.n_visible) < visible_probs).astype(np.int_) |
| 194 | + # After the final step, we have a new visible_state; sample a final hidden_state for completeness |
| 195 | + hidden_probs = self._sigmoid(visible_state.dot(self.weights) + self.hidden_bias) |
| 196 | + hidden_state = (np.random.rand(self.n_hidden) < hidden_probs).astype(np.int_) |
| 197 | + return visible_state, hidden_state |
| 198 | +Explanation: The RBM class encapsulates the entire RBM functionality: |
| 199 | + |
| 200 | +sample_hidden(visible) and sample_visible(hidden) perform one-step Gibbs sampling, drawing a binary sample of the hidden (or visible) layer given the opposite layer. |
| 201 | +train(data, epochs, batch_size, k) implements the Contrastive Divergence training procedure (CD-k). For each mini-batch, it runs a positive phase to get pos_hidden_probs, then runs a Gibbs chain of length k to produce a reconstructed neg_visible_states and corresponding neg_hidden_probs. The weights and biases are updated using the difference between positive associations (v * h from data) and negative associations (v * h from reconstruction) . Biases are adjusted by the difference between the data and reconstructed averages. |
| 202 | +reconstruct(visible) performs a one-step reconstruction: it returns the probabilities of the visible units after a round-trip (visible -> hidden -> visible) and also a sampled binary reconstruction. |
| 203 | +get_parameters() simply returns the learned weights and biases for inspection or use elsewhere. |
| 204 | +gibbs_sample(num_steps, initial_visible) allows generating a sample from the model by running a Gibbs sampler for a given number of steps. This can be used after training to sample new visible configurations (and their hidden states) from the learned distribution. |
| 205 | + |
| 206 | + |
| 207 | + |
| 208 | +Example Usage |
| 209 | + |
| 210 | + |
| 211 | +Below is an example demonstrating how to use the RBM class. We create a small synthetic dataset of binary patterns, train the RBM on it, and then show the learned parameters, reconstruction of an input, and a new sample generated from the model: |
| 212 | +# Example usage of the RBM class |
| 213 | + |
| 214 | +# 1. Create a synthetic dataset (binary patterns) |
| 215 | +# Here we use patterns of length 4. For example: |
| 216 | +# Pattern A = [1, 1, 0, 0] and Pattern B = [0, 0, 1, 1], repeated multiple times. |
| 217 | +data = [[1, 1, 0, 0]] * 50 + [[0, 0, 1, 1]] * 50 # 100 samples total (50 of each pattern) |
| 218 | + |
| 219 | +# 2. Initialize the RBM with 4 visible units and 2 hidden units |
| 220 | +rbm = RBM(n_visible=4, n_hidden=2, learning_rate=0.1) |
| 221 | + |
| 222 | +# 3. Train the RBM on the dataset for a certain number of epochs |
| 223 | +rbm.train(data, epochs=50, batch_size=10, k=1) # using CD-1 (k=1) and mini-batches of size 10 |
| 224 | + |
| 225 | +# 4. After training, retrieve the learned weights and biases |
| 226 | +weights, vis_bias, hid_bias = rbm.get_parameters() |
| 227 | +print("Learned weight matrix:\n", weights) |
| 228 | +print("Learned visible biases:", vis_bias) |
| 229 | +print("Learned hidden biases:", hid_bias) |
| 230 | + |
| 231 | +# 5. Reconstruct an example from the training data using the trained RBM |
| 232 | +test_visible = np.array([1, 1, 0, 0]) # input pattern A |
| 233 | +recon_probs, recon_state = rbm.reconstruct(test_visible) |
| 234 | +print("\nOriginal visible input:", test_visible) |
| 235 | +print("Reconstructed visible probabilities:", recon_probs) |
| 236 | +print("Reconstructed visible sample:", recon_state) |
| 237 | + |
| 238 | +# 6. Generate a new sample from the model using Gibbs sampling |
| 239 | +visible_sample, hidden_sample = rbm.gibbs_sample(num_steps=5) # start from a random visible state |
| 240 | +print("\nNew sampled visible state (after Gibbs sampling):", visible_sample) |
| 241 | +print("Corresponding hidden state:", hidden_sample) |
| 242 | +Output Explained: After sufficient training, the RBM’s weights and biases should reflect the correlations in the data. In this example, the model learns that the first two visible units are usually 1 while the last two are 0 (Pattern A), or vice versa for Pattern B. A possible learned weight matrix (4 visible × 2 hidden) might look like: |
| 243 | +[[-4.2, 3.8], # strong negative connection from hidden1 to vis1, strong positive to hidden2 |
| 244 | + [-4.2, 3.9], # vis2 similar to vis1 |
| 245 | + [ 4.2, -3.7], # vis3 has strong positive connection to hidden1, negative to hidden2 |
| 246 | + [ 4.2, -3.8]] # vis4 similar to vis3 |
| 247 | +Such weights indicate that one hidden unit (hidden1) is activated by the pattern [0,0,1,1] and the other (hidden2) by [1,1,0,0]. The reconstruction for an input like [1,1,0,0] yields high probabilities (~0.95) for the first two units being 1 and low probabilities (~0.05) for the last two being 1, meaning the RBM correctly reconstructs the original pattern. Conversely, an input of [0,0,1,1] would be reconstructed with the last two units highly likely to be 1 . Using gibbs_sample, we can obtain a random visible state from the model; in this trained example, the sampled visible state will tend to be either [1,1,0,0] or [0,0,1,1] (or a close variant), reflecting the patterns the RBM has learned. |
0 commit comments