The visual system is tasked with extracting stimulus content (e.g. the identity of an object) from the spatiotemporal light pattern falling on the retina. However, visual information can be ambiguous with regard to content (e.g. an object when viewed from far away), requiring the system to also consider contextual information. Additionally, visual information originating from the same content can differ (e.g. the same object viewed from different angles), requiring the system to extract content invariant to these differences. In this review, we explore these challenges from experimental and theoretical perspectives, and motivate the need to incorporate solutions for both ambiguity and invariance into hierarchical models of visual processing.