phD thesis - alan a stocker

Constraint Optimization Networks for Visual Motion Perception - Analysis and Synthesis

Alan A Stocker

Dissertation for the degree of doctor of natural sciences (dr. sc. nat.); Thesis no. 14360, Swiss Federal Institute of Technology ETH Zurich, Switzerland, March 2002

The extraction of visual motion information is an example of sensory information processing which typically has to deal with high data bandwidths and fast processing requirements. Nature provides us with many examples of visual motion processing systems that are far more efficient than any man-made artificial system. Biological systems use a physical computational architecture that consists of networks of highly interconnected simple units that all work in parallel. These networks represent a generic way of information processing that applies not only to sensory processing (sub-)systems but to any processing in neural structures such as our brain. The motivation to understand the computational aspects of such networks is therefore two-fold: Firstly, it is one of the great challenges of our time to understand how the brain works. Studying small networks that solve particular tasks might help us to understand the basic computational principles of more complex systems. Secondly, the obvious superiority in performance of these networks makes it very appealing to transfer such computational architectures into technology in order to provide efficient solutions to a particular class of computational problems.

This thesis presents and analyzes simple network solutions for different motion perception problems. It postulates that the perception of visual motion must be understood as an optimization problem. Optimization is necessary to deal with the ambiguity between the visual information and the perception of visual motion. It is meant to find the interpretation of visual motion that best fits the observed visual information according to the motion model built into the network. A non-hierarchical analog network architecture is proposed that is continuously providing an optimal estimate of the local visual motion (also called optical flow). It is rigorously shown that this network is always in a unique and well-determined optimal state independent of its visual input conditions, which is in contrast to other, previously suggested solutions. Simulations of the network behavior with realistic visual input show a plausible and robust estimation of the optical flow field. The characteristics of the estimate depend strongly on two global parameters rho and sigma, that determine the isotropic connection strengths between units in the network and the influence of some a priori assumption about the perceived visual motion. According to the values of these parameters, the characteristics of the estimated optical flow field vary continuously between a global motion estimate, in which case the aperture problem is solved for a single visual object, and a normal optical flow estimate that results when the connections between units of the network are completely disabled.

Two extended systems are introduced where additional networks are recurrently connected to the basic optical flow network in order to dynamically control the local values of the parameters $\rho$ and $\sigma$. The first system, the motion segmentation network, finds motion discontinuities and restricts the collective estimation process of the optical flow to those units that are considered to receive visual information of the same object in space. The network is able to provide close-to-optimal solutions for the computational hard problem of motion segmentation. The second system, the motion selective network, provides a selective perception of visual motion according to a given motion preference. This provides the means for an attentional control of the perception of visual motion.

In the second part of this thesis, the analysis-synthesis loop is closed by demonstrating how some of the proposed network architectures can be embedded in a physical substrate. Analog Very Large Scale Integrated (aVLSI) circuit implementations of the optical flow and motion segmentation networks are presented that are fully functional under real-world conditions. The two circuits are among the few examples of sensory systems that apply collective computation and represent probably the most powerful and complex implementations of their kind. The measured characteristics of these circuits prove the feasibility of using physical analog network architectures to solve a particular class of perceptual problems efficiently in practical applications.

reprint (pdf)