Learning hierarchical structure in natural images with multiple layers of L_p reconstruction neurons

Zhuo Wang and Alan A Stocker and Dan D Lee
Computational and Systems Neuroscience meeting CoSyNe, Salt Lake City, March 05-07 2015, Poster presentation.

High dimensional perceptual stimuli such as images or sounds are thought to be more efficiently represented in neural populations by redundancy reduction (Attneave 1954; Barlow 1961). Computational models for efficient coding optimize information theoretic objective functions such as maximum mutual information (MMI). In particular, MMI has been shown to be a promising principle to understand V1 simple cells due to its success in predicting edge-like filters for natural images (Bell and Sejnowski 1997). However, it is more difficult to apply the MMI principle iteratively to train additional layers without substantial modifications (Karklin and Lewicki 2003; Shan, Zhang and Cottrell 2006). Our work investigates the general principle of minimizing Lp reconstruction error to model multiple layers of noisy linear-nonlinear neurons. We show that both MMI (L0) and minimum mean squared error (MMSE, L2) are special cases of this generalized principle and optimal analytic solutions can be derived if the stimuli follows an elliptical distribution. In particular, we find that the optimal representation does not immediately eliminate correlations, but gradually reduces redundancy across the layers. As an application, we consider small (8x8) patches of natural images (van Hateren 1998). We demonstrate that the distribution of pixel intensities in these patches is near elliptical, and iteratively train multiple layers of MMSE neurons. We show detailed results for a two-layer model, where the response properties of the first and second layer neurons qualitatively match features of simple and complex cells respectively.