Important Disclaimer

The purpose of this blog is purely to serve as a compilation of good technical material for my students. No financial or other motives are involved. Most of the content in this blog has been reproduced from other sources. I have made every attempt to mention the source link at the beginning of each blog. All readers are requested to kindly acknowledge that source and not this blog, in case you find the post helpful. However, I have not been able to trace the source links for some of my older posts. I wish to emphasize that this is not intentional and any help in this regard would be appreciated.

Mar 7, 2007

The Boltzmann Machine


Introduction

The Boltzmann machine (named in honour of a 19th-century scientist by its inventors) has following features:

1. Processing units have binary states (±1)

2. Connections between units are symmetric

3. Units are picked at random and one at a time for updating

4. Units have no self-feedback.

5. Boltzmann machine permits the use of hidden neurons neurons.

6.Boltzmann machine uses stochastic neurons with a probabilistic firing mechanism.

7. Boltzmann machine may also be trained by a probabilistic form of supervision.

Structure of Boltzmann Machine




The stochastic neurons of a Boltzmann machine are in two groups: visible and hidden.Visible neurons provide an interface between the net and its environment.During the training phase, the visible neurons are clamped clamped; the hidden neurons always operate freely, they are used to explain underlying constraints in the environmental input vectors.

The hidden units do this by capturing higher-order correlations between the clamping vectors.

The network may be viewed as an unsupervised learning procedure for modelling a distribution that is specified by the clamping patterns.

The network can perform pattern completion: when a vector bearing part of the information is
clamped onto a subset of the visible neurons, the network performs completion of the pattern on
the remaining visible neurons (if it has learnt properly).

Boltzmann Machine Learning

The goal of Boltzmann learning is to produce a NN that categorizes input patterns according to a
Boltzmann distribution. Two assumptions are made:

• Each environmental vector persists long enough for the network to reach thermal equilibrium;


• There is no structure in the sequence in which environmental vectors are clamped to the visible units of the network.


Energy Minimization

The Boltzmann machine works by picking a hidden unit at random - say unit j, and , flipping the state of neuron j from sj to –s j at temperature T (during the annealing cycle) with probability


where dEj is the energy change resulting from such a flip. We define the energy function of the
Boltzmann machine as:
• This summation runs over both visible and hidden units. The condition j != i implies no self- selffeedback. wji is the weight from unit i to unit j. An external threshold . "j applied to unit j is provided as usual (a weight of – "j from a unit with a fixed output of 1).

• If the flipping procedure is applied repeatedly to the units, the net will reach thermal equilibrium equilibrium. At thermal equilibrium, the units will change state, but the probability of finding the network in any particular state remains constant and obeys the Boltzmann distribution.

• To find a stable configuration that is suited to the problem at hand, Boltzmann learning proceeds by first operating the net at high temperature, and gradually lowering it until is reaches thermal equilibrium at a series of temperatures, as prescribed by the simulated annealing procedure.



Summary of the Boltzmann Machine Learning Procedure

1. Initialization Initialization: set weights to random numbers in [–1,1]

2. Clamping Phase Phase: Present the net with the mapping it is supposed to learn by clamping input and output units to patterns. For each pattern, perform simulated annealing on the hidden units at a sequence T0, T1, ..., , Tfinal of temperatures. At the final temperature, collect statistics to estimate the correlations






3. Free-Running Phase Phase: Repeat the calculations performed in step 2, but this time clamp only the input units. Hence, at the final temperature, estimate the correlations




4. Updating of Weights Weights: update them using the learning rule



where n is a learning rate parameter.


5. Iterate until Convergence Convergence: Iterate steps 2 to 4 until the learning procedure converges with no more changes taking place in the synaptic weights w ji for all j, i.

No comments: