Shun-ichi Amari
RIKEN Brain Science Institute
Learning takes place in neural networks by
modifiying their conncetion weights, that
is, in the parameter space consisting of
the connection weights. The parameter space
is identified with the set of all the networks,
so that it is called a neuromanifold. The
neuromanifold is not a flat Euclidean space
but is a Riemannian space whose geometry
represents the architecture of neural networks.
Information geometry is used to elucidate
the geometry.
It has recently been remarked that a neuromanifold
includes lots of algebraic singularities
due to its symmetrical hierarchical structure.
Moreover, such singular points affect the
trajectory of dynamics of learning. In particular,
they causes the so-called plateaus. Ordinary
backpropagation learning becomes extremely
slow because of these plateaus.
The present talk analyzes these singularities,
and show how they give rise to plateaus.
The natural gradient learning algorithm is
shown to avoid such plateau phenomena. We
use a number of simple models to elucidate
statistical and learning aspects of parameters
in the presence of singularities. The maximum
likelihood estimator is no more efficient,
and the Cramer-Rao paradigm does not holds
in this case, because the central limit theorem
does not hold either. This opens a new aspect
in statistics and learning theory.