Climbing/Descending an unknown multi-dimensional landscape.
Never being shown the actual shape of the landscape, only deducing its shape
from a finite number of sample points on its surface.
No method that works like that can ever guarantee
to find the global maximum/minimum.
It may require an infinite amount of exploration to hit
(by luck) the global optimum,
if there is no landscape leading up/down towards it.
To avoid local optima,
we do not do strict descent/ascent. We can make moves
in the opposite direction.
This probability of this "noise"
is high at the start and declines as we go on.
For Neural Nets only:
It is given well-chosen, representative exemplars.
E is known (distance from correct answer).
is known (how error changes as we change the parameters).
We can make a directed move.
We start off by climbing/descending multiple landscapes at once.
Eventually, the weight specialises on climbing/descending
a particular family of similar landscapes.
For GAs only:
It has to make up its own exemplars.
E is not known. We do not know how good the "correct" answer might be.
We get a fitness score for our attempt, certainly, but we do not know how good
an attempt could possibly be.
is not known.
We can only make a random move.
We use a population of multiple climbers/descenders on the same landscape.
We follow the best performers.
So this may mean we are following a number of trails at once.