Visual speed information is optimally combined across different spatiotemporal frequency channels

Matjaz Jogan and Alan A Stocker
CoSyNe Computational and Systems Neuroscience, Salt Lake City UT, February 28 - March 03 2013
Oral presentation.

Humans have the ability to optimally combine sensory cues across different perceptual modalities (Ernst & Banks, 2002). Here, we tested whether optimal cue-combination also occurs within a single perceptual modality such as visual motion. Specifically, we studied how the human visual system computes the perceived speed of a translating intensity pattern that contains motion energy at multiple spatiotemporal frequency bands. We assume that this stimulus is encoded in a set of spatiotemporal frequency channels where the response of each channel represents an individual cue. We formulate a Bayesian observer model that optimally combines the likelihood functions computed for individual channel responses together with a prior for slow speeds. In order to validate this model, we performed a visual speed discrimination experiment. Stimuli were either drifting sinewave gratings with a single frequency at various contrasts, or pairwise linear combinations of those gratings in two different phase configurations that resulted in different overall pattern contrasts. The measured perceptual speed biases and discrimination thresholds show the expected Bayesian behavior where stimuli with larger thresholds were perceived to move slower (Stocker & Simoncelli, 2006). For the combined stimuli, discrimination thresholds were typically smaller compared to those measured for the independent components alone, which is a key feature of optimal cue combination. Finally, the two phase configurations of the combined stimuli did not lead to any significant difference in terms of both bias and threshold, supporting the notion of independent channels. Our observer model provided a good account of the data when jointly fit to all conditions. However, the fits were significantly better when we assumed that individual channel responses were normalized by the overall channel activity. Our findings suggest that perceived speed of more complex stimuli can be considered a result of optimal combination of cues provided by individual spatiotemporal frequency channels.