Artificial neural network

Artificial neural networkThe Hopfield network is a recurrent neural
An artificial neural network (ANN), usually callednetwork in which all connections are symmetric.
"neural network" (NN), is a mathematical model orInvented by John Hopfield in 1982, this network
computational model that tries to simulate theguarantees that its dynamics will converge. If the
structure and/or functional aspects of biologicalconnections are trained using Hebbian learning then
neural networks. It consists of an interconnectedthe Hopfield network can perform as robust
group of artificial neurons and processescontent-addressable (or associative) memory,
information using a connectionist approach toresistant to connection alteration. Echo state
computation. In most cases an ANN is an adaptivenetwork
system that changes its structure based onThe echo state network (ESN) is a recurrent
external or internal information that flows throughneural network with a sparsely connected random
the network during the learning phase. Neuralhidden layer. The weights of output neurons are
networks are non-linear statistical data modelingthe only part of the network that can change and
tools. They can be used to model complexbe learned. ESN are good to (re)produce temporal
relationships between inputs and outputs or to findpatterns. Long short term memory network
patterns in data.The Long short term memory is an artificial neural
Backgroundnet structure that unlike traditional RNNs doesn't
There is no precise agreed-upon definition amonghave the problem of vanishing gradients. It can
researchers as to what a neural network is, buttherefore use long delays and can handle signals
most would agree that it involves a network ofthat have a mix of low and high frequency
simple processing elements (neurons), which cancomponents.
exhibit complex global behavior, determined byStochastic neural networks
the connections between the processing elementsA stochastic neural network differs from a typical
and element parameters. The original inspirationneural network because it introduces random
for the technique came from examination of thevariations into the network. In a probabilistic view
central nervous system and the neurons (andof neural networks, such random variations can
their axons, dendrites and synapses) whichbe viewed as a form of statistical sampling, such
constitute one of its most significant informationas Monte Carlo sampling. Boltzmann machine
processing elements (see Neuroscience). In aThe Boltzmann machine can be thought of as a
neural network model, simple nodes (callednoisy Hopfield network. Invented by Geoff Hinton
variously "neurons", "neurodes", "PEs" ("processingand Terry Sejnowski in 1985, the Boltzmann
elements") or "units") are connected together tomachine is important because it is one of the first
form a network of nodes — hence the termneural networks to demonstrate learning of latent
"neural network." While a neural network does notvariables (hidden units). Boltzmann machine learning
have to be adaptive per se, its practical usewas at first slow to simulate, but the contrastive
comes with algorithms designed to alter thedivergence algorithm of Geoff Hinton (circa 2000)
strength (weights) of the connections in theallows models such as Boltzmann machines and
network to produce a desired signal flow.products of experts to be trained much faster.
These networks are also similar to the biologicalModular neural networks
neural networks in the sense that functions areBiological studies have shown that the human
performed collectively and in parallel by the units,brain functions not as a single massive network,
rather than there being a clear delineation ofbut as a collection of small networks. This
subtasks to which various units are assigned (seerealization gave birth to the concept of modular
also connectionism). Currently, the term Artificialneural networks, in which several small networks
Neural Network (ANN) tends to refer mostly tocooperate or compete to solve problems.
neural network models employed in statistics,Committee of machines
cognitive psychology and artificial intelligence.A committee of machines (CoM) is a collection of
Neural network models designed with emulation ofdifferent neural networks that together "vote" on
the central nervous system (CNS) ina given example. This generally gives a much
mind are a subject of theoretical neurosciencebetter result compared to other neural network
(computational neuroscience).models. Because neural networks suffer from
In modern software implementations of artificiallocal minima, starting with the same architecture
neural networks the approach inspired by biologyand training but using different initial random
has for the most part been abandoned for aweights often gives vastly different networks. A
more practical approach based on statistics andCoM tends to stabilize the result.
signal processing. In some of these systems,The CoM is similar to the general machine learning
neural networks or parts of neural networksbagging method, except that the necessary
(such as artificial neurons) are used asvariety of machines in the committee is obtained
components in larger systems that combine bothby training from different random starting weights
adaptive and non-adaptive elements. While therather than training on different randomly selected
more general approach of such adaptive systemssubsets of the training data. Associative neural
is more suitable for real-world problem solving, itnetwork (ASNN)
has far less to do with the traditional artificialThe ASNN is an extension of the committee of
intelligence connectionist models. What they domachines that goes beyond a simple/weighted
have in common, however, is the principle ofaverage of different models. ASNN represents a
non-linear, distributed, parallel and local processingcombination of an ensemble of feed-forward
and adaptation.neural networks and the k-nearest neighbor
Modelstechnique (kNN). It uses the correlation between
Neural network models in artificial intelligence areensemble responses as a measure of distance
usually referred to as artificial neural networksamid the analyzed cases for the kNN. This
(ANNs); these are essentially simple mathematicalcorrects the bias of the neural network ensemble.
models defining a function . Each type of ANNAn associative neural network has a memory
model corresponds to a class of such functions.that can coincide with the training set. If new data
Employing artificial neural networksbecome available, the network instantly improves
Perhaps the greatest advantage of ANNs is theirits predictive ability and provides data
ability to be used as an arbitrary functionapproximation (self-learn the data) without a need
approximation mechanism which 'learns' fromto retrain the ensemble. Another important
observed data. However, using them is not sofeature of ASNN is the possibility to interpret
straightforward and a relatively goodneural network results by analysis of correlations
understanding of the underlying theory is essential.between data cases in the space of models. The
- Choice of model: This will depend on the datamethod is demonstrated at where you can either
representation and the application. Overly complexuse it online or download it.
models tend to lead to problems with learning.Other types of networks
- Learning algorithm: There are numerousThese special networks do not fit in any of the
tradeoffs between learning algorithms. Almost anyprevious categories. Holographic associative
algorithm will work well with the correctmemory
hyperparameters for training on a particular fixedHolographic associative memory represents a
dataset. However selecting and tuning anfamily of analog, correlation-based, associative,
algorithm for training on unseen data requires astimulus-response memories, where information is
significant amount of experimentation.mapped onto the phase orientation of complex
- Robustness: If the model, cost function andnumbers operating. Instantaneously trained
learning algorithm are selected appropriately thenetworks
resulting ANN can be extremely robust.Instantaneously trained neural networks (ITNNs)
With the correct implementation ANNs can bewere inspired by the phenomenon of short-term
used naturally in online learning and large datasetlearning that seems to occur instantaneously. In
applications. Their simple implementation and thethese networks the weights of the hidden and
existence of mostly local dependencies exhibitedthe output layers are mapped directly from the
in the structure allows for fast, paralleltraining vector data. Ordinarily, they work on
implementations in hardware.binary data, but versions for continuous data that
Applicationsrequire small additional processing are also available.
The utility of artificial neural network models lies inSpiking neural networks
the fact that they can be used to infer a functionSpiking neural networks (SNNs) are models which
from observations. This is particularly useful inexplicitly take into account the timing of inputs.
applications where the complexity of the data orThe network input and output are usually
task makes the design of such a function by handrepresented as series of spikes (delta function or
impractical.more complex shapes). SNNs have an advantage
Real life applicationsof being able to process information in the time
The tasks to which artificial neural networks aredomain (signals that vary over time). They are
applied tend to fall within the following broadoften implemented as recurrent networks. SNNs
categories:are also a form of pulse computer.
- Function approximation, or regression analysis,Spiking neural networks with axonal conduction
including time series prediction, fitnessdelays exhibit polychronization, and hence could
approximation and modeling.have a very large memory capacity.
- Classification, including pattern and sequenceNetworks of spiking neurons — and the
recognition, novelty detection and sequentialtemporal correlations of neural assemblies in such
decision making.networks — have been used to model figure
- Data processing, including filtering, clustering, blindground separation and region linking in the visual
source separation and compression.system (see, for example, Reitboeck et al.in
- Robotics, including directing manipulators,Haken and Stadler: Synergetics of the Brain. Berlin,
Computer numerical control.1989).
Application areas include system identification andIn June 2005 IBM announced construction of a
control (vehicle control, process control), quantumBlue Gene supercomputer dedicated to the
chemistry, game-playing and decision makingsimulation of a large recurrent spiking neural
(backgammon, chess, racing), pattern recognitionnetwork.
(radar systems, face identification, objectGerstner and Kistler have a freely available online
recognition and more), sequence recognitiontextbook on Spiking Neuron Models. Dynamic
(gesture, speech, handwritten text recognition),neural networks
medical diagnosis, financial applications (automatedDynamic neural networks not only deal with
trading systems), data mining (or knowledgenonlinear multivariate behaviour, but also include
discovery in databases, "KDD"), visualization and(learning of) time-dependent behaviour such as
e-mail spam filtering.various transient phenomena and delay effects.
Neural network softwareCascading neural networks
Neural network software is used to simulate,Cascade-Correlation is an architecture and
research, develop and apply artificial neuralsupervised learning algorithm developed by Scott
networks, biological neural networks and in someFahlman and Christian Lebiere. Instead of just
cases a wider array of adaptive systems. Seeadjusting the weights in a network of fixed
also logistic regression.topology, Cascade-Correlation begins with a
minimal network, then automatically trains and
Types of neural networksadds new hidden units one by one, creating a
Feedforward neural networkmulti-layer structure. Once a new hidden unit has
The feedforward neural network was the firstbeen added to the network, its input-side weights
and arguably simplest type of artificial neuralare frozen. This unit then becomes a permanent
network devised. In this network, the informationfeature-detector in the network, available for
moves in only one direction, forward, from theproducing outputs or for creating other, more
input nodes, through the hidden nodes (if any) andcomplex feature detectors. The
to the output nodes. There are no cycles or loopsCascade-Correlation architecture has several
in the network.advantages over existing algorithms: it learns very
Radial basis function (RBF) networkquickly, the network determines its own size and
Radial Basis Functions are powerful techniques fortopology, it retains the structures it has built even
interpolation in multidimensional space. A RBF is aif the training set changes, and it requires no
function which has built into a distance criterionback-propagation of error signals through the
with respect to a center. Radial basis functionsconnections of the network. See: Cascade
have been applied in the area of neural networkscorrelation algorithm. Neuro-fuzzy networks
where they may be used as a replacement forA neuro-fuzzy network is a fuzzy inference
the sigmoidal hidden layer transfer characteristic insystem in the body of an artificial neural network.
Multi-Layer Perceptrons. RBF networks have twoDepending on the FIS type, there are several
layers of processing: In the first, input is mappedlayers that simulate the processes involved in a
onto each RBF in the 'hidden' layer. The RBFfuzzy inference like fuzzification, inference,
chosen is usually a Gaussian. In regressionaggregation and defuzzification. Embedding an FIS
problems the output layer is then a linearin a general structure of an ANN has the benefit
combination of hidden layer values representingof using available ANN training methods to find the
mean predicted output. The interpretation of thisparameters of a fuzzy system. Compositional
output layer value is the same as a regressionpattern-producing networks
model in statistics. In classification problems theCompositional pattern-producing networks (CPPNs)
output layer is typically a sigmoid function of aare a variation of ANNs which differ in their set of
linear combination of hidden layer values,activation functions and how they are applied.
representing a posterior probability. PerformanceWhile typical ANNs often contain only sigmoid
in both cases is often improved by shrinkagefunctions (and sometimes Gaussian functions),
techniques, known as ridge regression in classicalCPPNs can include both types of functions and
statistics and known to correspond to a priormany others. Furthermore, unlike typical ANNs,
belief in small parameter values (and thereforeCPPNs are applied across the entire space of
smooth output functions) in a Bayesianpossible inputs so that they can represent a
framework.complete image. Since they are compositions of
RBF networks have the advantage of notfunctions, CPPNs in effect encode images at
suffering from local minima in the same way asinfinite resolution and can be sampled for a
Multi-Layer Perceptrons. This is because the onlyparticular display at whatever resolution is optimal.
parameters that are adjusted in the learningOne-shot associative memory
process are the linear mapping from hidden layerThis type of network can add new patterns
to output layer. Linearity ensures that the errorwithout the need for re-training. It is done by
surface is quadratic and therefore has a singlecreating a specific memory structure, which
easily found minimum. In regression problems thisassigns each new pattern to an orthogonal plane
can be found in one matrix operation. Inusing adjacently connected hierarchical arrays. The
classification problems the fixed non-linearitynetwork offers real-time pattern recognition and
introduced by the sigmoid output function is mosthigh scalability, it however requires parallel
efficiently dealt with using iteratively re-weightedprocessing and is thus best suited for platforms
least squares.such as Wireless sensor networks (WSN), Grid
RBF networks have the disadvantage of requiringcomputing, and GPGPUs.
good coverage of the input space by radial basisTheoretical properties
functions. RBF centres are determined withComputational power
reference to the distribution of the input data, butThe multi-layer perceptron (MLP) is a universal
without reference to the prediction task. As afunction approximator, as proven by the Cybenko
result, representational resources may be wastedtheorem. However, the proof is not constructive
on areas of the input space that are irrelevant toregarding the number of neurons required or the
the learning task. A common solution is tosettings of the weights.
associate each data point with its own centre,Work by Hava Siegelmann and Eduardo D. Sontag
although this can make the linear system to behas provided a proof that a specific recurrent
solved in the final layer rather large, and requiresarchitecture with rational valued weights (as
shrinkage techniques to avoid overfitting.opposed to the commonly used floating point
Associating each input datum with an RBF leadsapproximations) has the full power of a Universal
naturally to kernel methods such as SupportTuring Machine using a finite number of neurons
Vector Machines and Gaussian Processes (theand standard linear connections. They have
RBF is the kernel function). All three approachesfurther shown that the use of irrational values for
use a non-linear kernel function to project theweights results in a machine with super-Turing
input data into a space where the learningpower.
problem can be solved using a linear model. LikeCapacity
Gaussian Processes, and unlike SVMs, RBFArtificial neural network models have a property
networks are typically trained in a Maximumcalled 'capacity', which roughly corresponds to their
Likelihood framework by maximizing theability to model any given function. It is related to
probability (minimizing the error) of the data underthe amount of information that can be stored in
the model. SVMs take a different approach tothe network and to the notion of complexity.
avoiding overfitting by maximizing instead aConvergence
margin. RBF networks are outperformed in mostNothing can be said in general about convergence
classification applications by SVMs. In regressionsince it depends on a number of factors. Firstly,
applications they can be competitive when thethere may exist many local minima. This depends
dimensionality of the input space is relatively small.on the cost function and the model. Secondly, the
Kohonen self-organizing networkoptimization method used might not be
The self-organizing map (SOM) invented by Teuvoguaranteed to converge when far away from a
Kohonen performs a form of unsupervisedlocal minimum. Thirdly, for a very large amount of
learning. A set of artificial neurons learn to mapdata or parameters, some methods become
points in an input space to coordinates in animpractical. In general, it has been found that
output space. The input space can have differenttheoretical guarantees regarding convergence are
dimensions and topology from the output space,an unreliable guide to practical application.
and the SOM will attempt to preserve these.Generalisation and statistics
Recurrent networkIn applications where the goal is to create a
Contrary to feedforward networks, recurrentsystem that generalises well in unseen examples,
neural networks (RNs) are models withthe problem of overtraining has emerged. This
bi-directional data flow. While a feedforwardarises in overcomplex or overspecified systems
network propagates data linearly from input towhen the capacity of the network significantly
output, RNs also propagate data from laterexceeds the needed free parameters. There are
processing stages to earlier stages. Simpletwo schools of thought for avoiding this problem:
recurrent networkThe first is to use cross-validation and similar
A simple recurrent network (SRN) is a variationtechniques to check for the presence of
on the Multi-Layer Perceptron, sometimes calledovertraining and optimally select hyperparameters
an "Elman network" due to its invention by Jeffsuch as to minimize the generalisation error. The
Elman. A three-layer network is used, with thesecond is to use some form of regularisation. This
addition of a set of "context units" in the inputis a concept that emerges naturally in a
layer. There are connections from the middleprobabilistic (Bayesian) framework, where the
(hidden) layer to these context units fixed with aregularisation can be performed by selecting a
weight of one. At each time step, the input islarger prior probability over simpler models; but
propagated in a standard feed-forward fashion,also in statistical learning theory, where the goal is
and then a learning rule (usually back-propagation)to minimize over two quantities: the 'empirical risk'
is applied. The fixed back connections result in theand the 'structural risk', which roughly correspond
context units always maintaining a copy of theto the error over the training set and the
previous values of the hidden units (since theypredicted error in unseen data due to overfitting.
propagate over the connections before theConfidence analysis of a neural network
learning rule is applied). Thus the network canSupervised neural networks that use an MSE cost
maintain a sort of state, allowing it to performfunction can use formal statistical methods to
such tasks as sequence-prediction that aredetermine the confidence of the trained model.
beyond the power of a standard Multi-LayerThe MSE on a validation set can be used as an
Perceptron.estimate for variance. This value can then be
In a fully recurrent network, every neuronused to calculate the confidence interval of the
receives inputs from every other neuron in theoutput of the network, assuming a normal
network. These networks are not arranged indistribution. A confidence analysis made this way
layers. Usually only a subset of the neuronsis statistically valid as long as the output probability
receive external inputs in addition to the inputsdistribution stays the same and the network is
from all the other neurons, and another disjunctnot modified.
subset of neurons report their output externallyBy assigning a softmax activation function on the
as well as sending it to all the neurons. Theseoutput layer of the neural network (or a softmax
distinctive inputs and outputs perform the functioncomponent in a component-based neural network)
of the input and output layers of a feed-forwardfor categorical target variables, the outputs can
or simple recurrent network, and also join all thebe interpreted as posterior probabilities. This is
other neurons in the recurrent processing. Hopfieldvery useful in classification as it gives a certainty
networkmeasure on classifications.