Natural Gradient Learning in NLDA Networks

José R. Dorronsoro; Ana González; Carlos Santa Cruz

doi:10.1007/3-540-45720-8_50

Natural Gradient Learning in NLDA Networks

José R. Dorronsoro, Ana González, Carlos Santa Cruz

Źródło

Lecture Notes in Computer Science > Connectionist Models of Neurons, Learning Processes, and Artificial Intelligence > Learning and Other Plasticity Phenomena, and Complex Systems Dynamics > 427-434

Abstrakt

Neural network training is usually formulated as a problem in function minimization. More precisely, if W are the weights defining a network’s architecture And e(W) is the weight depending error function, its gradient ∇e(W) is usually employed to arrive at the optimal weight set W*. There may be several ways of exploting this information and the simplest is just plain gradient descent, Which assumes an “Euclidean” structure in the underlying space of the W weights. Although very natural, this may result sometimes in quite slow network Learning in some problems, both in batch and, especially, on line error Minimization, where the global error function e(W) is replaced by an individual, Z pattern depending error function e(Z,W). Several procedures such as Adaptive learning rates or the addition of momentum terms have been proposed [6]. A different approach is suggested by the fact that in some instances, there May be metrics other than the euclidean one better suited to describe weight Space. This has been shown to be the case for a related problem, likelihood Estimates for parametric probability models [1], [4], for which a Riemannian structure Can be defined in weight space. The same reasoning can be applied for a Concrete network model, Multilayer Perceptrons (MLPs). When used in regression Problems, that is when the MLP tries to establish a relationship between An input X and output y for each pattern Z = (X,y), a probability model p(Z;W) = p(X,y; W) can be defined in pattern space so that the on line MLP Error function e(Z,W) = e(X,y,W) = (y - F(X,W)²/2 is seen as the log-likelihood Of p(Z;W); here F(X,W) denotes the network’s transfer function. This allows one to recast network learning as the likelihood estimation of a certain semi—parametric probability density p(X,y,W). In this setting, there is [2] a natural Riemannian metric on the space {p(X, y; W): W} of these densities, determined by a metric tensor given by the matrix $$ G(W) = E\left[ {(\nabla _W \log p)(\nabla _W \log p)^t } \right] = \int {\int {\frac{{\partial \log p}} {{\partial W}}} \left( {\frac{{\partial \log p}} {{\partial W}}} \right)} ^t p(X,y;W)dXdy. $$ G(W) is also known as the Fisher Information matrix, as it gives the variance of Cramer—Rao bound for the optimal parameter estimator. This suggests to use the “natural” gradient in the Riemannian setting, that is G(W)⁻¹∇_w e(X, y; W), Instead of the ordinary Euclidean gradient ∇_w e(X, y; W).

Identyfikatory

ISSN serii :	0302-9743
e-ISSN serii :	1611-3349
ISBN książki :	978-3-540-42235-8
e-ISBN książki :	978-3-540-45720-6
DOI	10.1007/3-540-45720-8_50

Autorzy

José R. Dorronsoro

Universidad Autónoma de Madrid, Depto. de Ingeniería Informática and Instituto de Ingeniería del Conocimiento, Madrid, Spain

Ana González

Universidad Autónoma de Madrid, Depto. de Ingeniería Informática and Instituto de Ingeniería del Conocimiento, Madrid, Spain

Carlos Santa Cruz

Universidad Autónoma de Madrid, Depto. de Ingeniería Informática and Instituto de Ingeniería del Conocimiento, Madrid, Spain

Informacje dodatkowe

Właściciel praw autorskich:Springer-Verlag Berlin Heidelberg, 2001

Zbiór danych: Springer

Wydawca

Springer Berlin Heidelberg

rozdział

Czytaj online
Pobierz
Dodaj do przeczytania
Dodaj do kolekcji
Dodaj do obserwowanych
Podziel się

Eksport do bibliografii


Przypisz innemu użytkownikowi
	×
Niepoprawny email

INFONA - portal komunikacji naukowej

Natural Gradient Learning in NLDA Networks $("#expandableTitles").expandable();

Źródło

Abstrakt

Identyfikatory

Autorzy

Przypisywanie użytkownika

Potwierdzenie anulowania przypisania

Czy jesteś pewien, że chcesz anulować to przypisanie?

José R. Dorronsoro

Ana González

Carlos Santa Cruz

Informacje dodatkowe

Wydawca

Podziel się

Eksport do bibliografii

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu

Natural Gradient Learning in NLDA Networks