How We See the World: From Retina to Brain

Your eyes capture light, but your brain builds the picture. Seeing the world is a multi-stage process that starts when photons hit the back of your eye and ends, roughly 150 milliseconds later, with a rich, three-dimensional, full-color experience assembled by your brain. What feels instant and effortless is actually one of the most complex feats in biology.

Light Hits the Retina

Vision begins when light passes through the cornea, pupil, and lens and lands on the retina, a thin layer of tissue lining the back of the eye. The retina contains two types of light-sensitive cells: rods and cones. Rods handle low-light and peripheral vision. Cones, concentrated in a tiny central pit called the fovea, handle color and fine detail. The fovea packs around 164,000 cones per square millimeter, but that density drops dramatically toward the edges of the retina, falling to roughly 5,000 to 7,000 cones per square millimeter at 30 degrees from center. This is why you can read small text only when you look directly at it.

When a photon strikes a cone or rod, it’s absorbed by a light-sensitive molecule built from vitamin A and a protein called opsin. The photon flips the shape of the vitamin A component, which triggers a chain reaction inside the cell. That chain reaction lowers levels of a signaling molecule, which in turn closes tiny channels in the cell membrane. The closing of those channels changes the cell’s electrical state, converting a flash of light into a neural signal. This entire conversion, from photon to electrical impulse, is called phototransduction, and it happens continuously across millions of photoreceptors every moment your eyes are open.

How You See Color

Human color vision relies on three types of cones, each tuned to a different slice of the light spectrum. Short-wavelength cones respond best to blue-violet light (around 450 nanometers). Medium-wavelength cones peak in the green range (around 530 nanometers). Long-wavelength cones respond most strongly to red-orange light (around 630 nanometers). Every color you perceive is the result of your brain comparing the relative activation levels of these three cone types. A lemon looks yellow not because “yellow cones” fire, but because the long and medium cones are both strongly stimulated while the short cones are not.

This three-cone system, called trichromacy, gives humans the ability to distinguish roughly a million color variations. It likely evolved because it helped early primates spot ripe fruit against green foliage. People with inherited color vision differences typically have one cone type that’s altered or missing, which shifts or collapses part of that three-way comparison.

The Path From Eye to Brain

Signals from your photoreceptors pass through several layers of processing cells within the retina itself before exiting through the optic nerve. From there, most visual information travels to a relay station deep in the brain called the lateral geniculate nucleus (LGN). The LGN doesn’t just pass signals along passively. It receives far more input from higher brain regions than it does from the eyes, suggesting it actively filters and prioritizes visual information before forwarding it to the visual cortex at the back of the skull.

The primary visual cortex (known as V1) is the first major cortical stop. Neurons here respond to basic features: edges, orientations, simple contrasts. From V1, visual information fans out into increasingly specialized areas, each handling a different aspect of what you see.

Two Streams: “What” and “Where”

After the primary visual cortex, your brain splits visual processing into two broad pathways. The ventral stream runs along the lower part of the brain toward the temporal lobe and handles object recognition: shape, texture, identity. This is the pathway that lets you tell a coffee mug from a glass, or recognize a friend’s face in a crowd. The dorsal stream runs upward toward the parietal lobe and processes spatial information: where things are, how fast they’re moving, and how to reach for them.

These two streams aren’t entirely independent. Research using brain imaging shows that both the ventral and dorsal pathways contribute to shape perception. But location processing appears to be almost exclusively a dorsal stream function. The ventral stream, by contrast, is strongly biased toward its primary job of identifying objects and contributes little to spatial positioning. This division helps explain why certain types of brain damage can leave someone able to describe what an object looks like but unable to point to it, or vice versa.

How You See Motion

A specialized brain region called area MT (sometimes labeled V5), located in the temporal lobe, is dedicated to processing visual motion. Neurons in this area are selective for the direction and speed of moving objects. What makes MT neurons special is their stability: they continue to signal the same direction of motion even when the shape, size, or orientation of the moving object changes. This gives you a consistent sense of movement regardless of what’s moving.

MT neurons come in different functional types. Some respond to the movement of individual edges and contours. Others, called pattern cells, integrate those signals to represent the motion of whole surfaces. Pattern cells tend to prefer faster speeds and larger-scale visual features, which makes sense for tracking objects as they sweep across your field of view. The primary visual cortex (V1) can only detect motion perpendicular to an edge’s orientation, so area MT performs the critical work of combining those limited signals into the unified perception of a ball flying through the air or a car turning a corner.

How You See Depth

The images on each retina are flat, yet you experience the world in three dimensions. Your brain achieves this by combining multiple depth cues, some requiring both eyes and others available to just one.

The most powerful depth cue is stereopsis, which exploits the fact that your two eyes are separated by a few centimeters. Each eye receives a slightly different view of the same scene. Your brain compares these two images, and the small differences between them (called binocular disparity) are calculated into a sense of depth. This is the same principle behind 3D movies.

But you can still judge distance with one eye closed, thanks to a set of monocular cues:

Relative size: Familiar objects that appear smaller are interpreted as farther away.
Interposition: When one object overlaps another, the overlapped object is perceived as more distant.
Linear perspective: Parallel lines (roads, railroad tracks) appear to converge in the distance.
Aerial perspective: Distant objects look bluer and hazier because of light scattering in the atmosphere, which is why far-off mountains appear blue.
Light and shade: Shadows and highlights reveal an object’s three-dimensional shape.
Motion parallax: When you move your head, nearby objects shift more in your visual field than distant ones.

Your brain weighs all of these cues simultaneously, cross-checking them against each other to construct a stable three-dimensional scene.

Why Forward-Facing Eyes Matter

Human eyes face forward, giving us a wide zone of binocular overlap. This arrangement costs us some peripheral vision compared to animals with eyes on the sides of their heads, like horses or rabbits. The tradeoff is significant: forward-facing eyes allow stereopsis, and research suggests they also grant a kind of “X-ray vision” in cluttered environments. When branches, leaves, or other obstacles partially block a view, having two eyes looking from slightly different angles lets the brain piece together a more complete picture of what lies behind the clutter. For primates navigating dense forests, this ability to see through visual noise was a major survival advantage.

Your Brain Fills In What’s Missing

Perhaps the most surprising part of vision is how much of it is constructed rather than recorded. Each of your eyes has a blind spot where the optic nerve exits the retina, leaving a gap with no photoreceptors at all. You never notice it because your brain fills in the missing information, using patterns from the surrounding visual field to generate a seamless image.

This filling-in is one example of a broader strategy called predictive processing. Your visual system doesn’t passively wait for data to arrive. Instead, higher brain areas constantly generate predictions about what you should be seeing, based on learned statistical patterns from a lifetime of visual experience. These predictions flow downward to lower visual areas. Lower areas then send back only the “error signal,” the difference between what was predicted and what actually arrived. This back-and-forth cycle of prediction and correction repeats across multiple levels of the visual hierarchy, allowing your brain to process scenes rapidly and efficiently.

This is why you can recognize a partially hidden object, read words with missing letters, or navigate a familiar room in dim light. Your brain is not just processing incoming light. It is actively building a model of the world and updating it in real time, all within about 150 milliseconds of a new image hitting your retina. What you experience as “seeing” is the final product of that construction: not a photograph of reality, but your brain’s best guess at what’s out there.