next up previous contents index
Next: Descriptions of Our New Up: Neural Networks Previous: Artificial Neural Networks (ANN)   Contents   Index


Random Neural Networks (RNN)

The Random Neural Networks (RNN) has been proposed by Erol Gelenbe in 1989 [45]. RNNs have been extensively studied, evaluated and analyzed in [49,89,48,45,101,51]; and analog hardware implementations of RNN are detailed in [3,25]. Low bit rate video compression using RNN is presented in [30]. A list of RNN applications are found in [2,4,6,7,50,99,123,128,148,10]. Gelenbe's idea can be described as a merge between the classical Artificial Neural Networks (ANN) model and queuing networks. Since this tool is a novel one, let us describe here its main characteristics. RNN are, as ANN, composed of a set of interconnected neurons. These neurons exchange signals that travel instantaneously from neuron to neuron, and send and receive signals to and from the environment. Each neuron has a potential associated with, which is an integer (random) variable. The potential of neuron $i$ at time $t$ is denoted by $q_i(t)$. If the potential of neuron $i$ is strictly positive, the neuron is excited; in that state, it randomly sends signals (to other neurons or to the environment), according to a Poisson process with rate $r_i$. Signals can be positive or negative. The probability that a signal sent by neuron $i$ goes to neuron $j$ as a positive one, is denoted by $p^+_{i,j}$, and as a negative one, by $p^-_{i,j}$; the signal goes to the environment (that is, it leaves the network) with probability $d_i$. So, if $N$ is the number of neurons, we must have for all $i = 1,\cdots,N$,

\begin{displaymath}d_i + \sum_{j=1}^N \left( p^+_{i,j}+ p^-_{i,j}\right) = 1. \end{displaymath}

When a neuron receives a positive signal, either from another neuron or from the environment, its potential is increased by 1; if it receives a negative one, its potential decreases by 1 if it was strictly positive and it does not change if its value was 0. In the same way, when a neuron sends a signal, positive or negative, its potential is decreased by one unit (it was necessarily strictly positive since only excited neurons send signals)31. The flow of positive (resp. negative) signals arriving from the environment to neuron $i$ (if any) is a Poisson process which rate is denoted by $\lambda^+_{i}$ (resp. $\lambda^-_{i}$). It is possible to have $\lambda^+_i = 0$ and/or $\lambda^-_i = 0$ for some neuron $i$, but to deal with an ``alive'' network, we need $\sum_{i=1}^N \lambda^+_i > 0$. Finally, we make the usual independence assumptions between these arrival processes, the processes composed of the signals sent by each neuron, etc. The discovery of Gelenbe is that this model has a product form stationary solution. This is similar to the classical Jackson's result on open networks of queues. If process $\vec{q}\,(t) = (q_1(t),\cdots,q_N(t))$ is ergodic (we will say that the network is stable), Gelenbe proved that
\begin{displaymath}
\lim_{t \rightarrow \infty} \Pr(\vec{q}\,(t) = (n_1,\cdots,n_N)) =
\prod_{i=1}^N (1 - \varrho_i) \varrho_i^{n_i}
\end{displaymath} (31)

where the $\varrho_i$s satisfy the following non-linear system of equations:
\begin{displaymath}
\mbox{for each node $i$, } \quad
\varrho_i = \frac{T^+_i}{r_i + T^-_i},
\end{displaymath} (32)


\begin{displaymath}
\mbox{for each node $i$, } \quad T^+_i = \lambda^+_i + \sum_{j=1}^N \varrho_j r_j p^+_{j,i},
\end{displaymath} (33)

and
\begin{displaymath}
\mbox{for each node $i$, } \quad T^-_i = \lambda^-_i + \sum_{j=1}^N \varrho_j r_j p^-_{j,i}.
\end{displaymath} (34)

Relation (3.1) tells us that $\varrho_i$ is the probability that, in equilibrium, neuron $i$ is excited, that is,

\begin{displaymath}\varrho_i = \lim_{t \rightarrow \infty} \Pr( q_i(t) > 0). \end{displaymath}

Observe that the non-linear system composed of equations (3.2), (3.3) and (3.4) has $3N$ equations and $3N$ unknowns (the $\varrho_i$s, the $T^+_i$s and the $T^-_i$s). Relations (3.3) and (3.4) tell us that $T^+_i$ is the mean throughput of positive signals arriving to neuron $i$ and that $T^-_i$ is the corresponding mean throughput of negative signals (always in equilibrium). Finally, Gelenbe proved, first, that this non-linear system has a unique solution, and, second, that the stability condition of the network is equivalent to the fact that, for all node $i$, $\varrho_i < 1$. Let us describe now the use of this model in statistical learning. Following previous applications of RNN, we fix the $\lambda^-_{i}$s to 0 (so, there is no negative signal arriving from outside). As a learning tool, the RNN will be seen as a black-box having $N$ inputs and $N$ outputs. The inputs are the rates of the incoming flows of positive signals arriving from outside, i.e. the $\lambda^+_{i}$s. The output values are the $\varrho_i$s. In fact, in applications, most of the time some neurons do not receive signals from outside, which simply corresponds to fixing some $\lambda^+_{i}$s to 0; in the same way, users often use as output only a subset of $\varrho_i$s. At this point, let us assume that the number of neurons has been chosen, and that the topology of the network is selected; this means that we have selected the pairs of neurons that will exchange signals, without fixing the values of the rates $r_i$ and the branching probabilities $p^+_{i,j}$ and $p^-_{i,j}$. Our learning data is then composed of a set of $K$ input-output pairs, which we will denote here by $\{ ( \vec{x}^{(k)}, \vec{y}^{\;(k)}), \; k = 1,\cdots,K \} $, where $\vec{x}^{(k)}= ( x^{(k)} _1 , \cdots,
\ x^{(k)} _N )$ and $\vec{y}^{\;(k)}= ( y^{(k)} _1 , \cdots, y^{(k)} _N )$. The goal of the learning process is to obtain values for the remaining parameters of the RNN (the rates $r_i$ and the branching probabilities $p^+_{i,j}$ and $p^-_{i,j}$) such that if, in the resulting RNN, we set $\lambda^+_{i}= x^{(k)} _i$ for all $i$ (and $\lambda^-_{i}= 0$), then, for all $i$, the steady-state occupation probability $\varrho_i$ is close to $ y^{(k)} _i$. This must hold for any value of $k \in \{1,\cdots,K\}$. To obtain this result, first of all, instead of working with rates and branching probabilities, the following variables are used:

\begin{displaymath}w^+_{i,j}= r_i p^+_{i,j}\quad \mbox{and} \quad w^-_{i,j}= r_i p^-_{i,j}. \end{displaymath}

This means that $w^+_{i,j}$ (resp. $w^-_{i,j}$) is the mean throughput in equilibrium of positive (resp. negative) signals going from neuron $i$ to neuron $j$. They are called weights by analogy to standard ANN. The learning algorithm proceeds then formally as follows. The set of weights in the network's topology is initialized to some arbitrary positive value, and then $K$ iterations are performed which modify them. Let us call $w^{+ \, (0)}_{i,j}$ and $w^{- \, (0)}_{i,j}$ the initial weights for the connection between $i$ and $j$. Then, for $k = 1,\cdots,K$, the set of weights at step $k$ is computed from the set of weights at step $k-1$ using a learning scheme, as usual with neural networks. More specifically, denote by ${\cal R}^{(k-1)}$ the network obtained after step $k-1$, defined by weights $w^{+ \, (k-1)}_{i,j}$ and $w^{- \, (k-1)}_{i,j}$. When we set the input rates (external positive signals) in ${\cal R}^{(k-1)}$ to the $ x^{(k)} _i$s values, we obtain the steady-state occupations $\varrho^{(k)}_i$s (assuming stability). The weights at step $k$ are then defined by
\begin{displaymath}
w^{+ \, (k)}_{i,j}= w^{+ \, (k-1)}_{i,j}- \eta \sum_{l=1}^...
...y^{(k)} _l)
\frac{\partial \varrho_l}{\partial w^+_{i,j}},
\end{displaymath} (35)


\begin{displaymath}
w^{- \, (k)}_{i,j}= w^{- \, (k-1)}_{i,j}- \eta \sum_{l=1}^...
...y^{(k)} _l)
\frac{\partial \varrho_l}{\partial w^-_{i,j}},
\end{displaymath} (36)

where the partial derivatives $\partial\varrho_h/{\partial w^{*}_{m,n}}$ ($ \lq\lq *'' $ being `+' or `$-$') are evaluated at $\varrho_h = \varrho^{(k)}_h$ and $w^{*}_{m,n} = w^{* \,(k-1)}_{m,n}$. Factor $c_l$ is a cost positive term allowing to give different importance to different ouput neurons. If some neuron $l$ must not be considered in the output, we simply set $c_l = 0$. In our cases, we have only one output neuron, so, this factor is not relevant; we put it in (3.5) and (3.6) just to give the general form of the equation. This is a gradient descent algorithm, corresponding to the minimization of the cost function

\begin{displaymath}\frac{1}{2}\sum_{l=1}^{N} c_l (\varrho^{(k)}_l - y^{(k)} _l)^2. \end{displaymath}

Once again, the relations between the output and the input parameters in the product form result allows to explicitly derive a calculation scheme for the partial derivatives. Instead of solving a non-linear system as (3.2), (3.3) and (3.4), it is shown in [47] that here we just have a linear system to be solved. When relations (3.5) and (3.6) are applied, it may happen that some value $w^{+ \, (k)}_{i,j}$ or $w^{- \, (k)}_{i,j}$ is negative. Since this is not allowed in the model, the weight is set to 0 and it is no more concerned by the updating process (another possibility is to modify the $\eta$ coefficient and apply the relation again; previous studies have been done using the first discussed solution, which we also adopt). Once the $K$ learning values have been used, the whole process is repeated several times, until some convergence conditions are satisfied. Remember that, ideally, we want to obtain a network able to give output $\vec{y}^{\;(k)}$ when the input is $\vec{x}^{(k)}$, for $k = 1,\cdots,K$. As in most applications of ANN for learning purposes, we use a 3-level network structure: the set of neurons $\{1,\cdots,N\}$ is partitioned into 3 subsets: the set of input nodes, the set of intermediate or hidden nodes and the set of output nodes. The input nodes receive (positive) signals from outside and don't send signals outside (that is, for each input node $i$, $\lambda^+_i > 0$ and $d_i = 0$). For output nodes, the situation is the opposite: $\lambda^+_i = 0$ and $d_i > 0$. The intermediate nodes are not directly connected to the environment; that is, for any hidden neuron $i$, we have $\lambda^+_{i}= \lambda^-_{i}= d_i = 0$. Moreover, between the nodes inside each level there are no transitions. Last, input neurons are only connected to hidden ones, and hidden neurons are only connected to output ones. This is a typical structure for neural networks used as a learning tool. Moreover, RNN mathematical analysis (that is, solving the non-linear and linear systems) is considerably simplified in this case. In particular, it can easily be shown that the network is always stable. See [47] or [46] and Chapter 10 for the details. ChapterChapter
next up previous contents index
Next: Descriptions of Our New Up: Neural Networks Previous: Artificial Neural Networks (ANN)   Contents   Index
Samir Mohamed 2003-01-08