How Voice Is Produced: From Lungs to Spoken Word

Your voice is produced by air from your lungs vibrating two small folds of tissue in your throat, with the resulting sound shaped into speech by your tongue, lips, and jaw. The whole process coordinates over 100 muscles in real time, making speech one of the most complex motor behaviors the human body performs.

The Three Systems Behind Your Voice

Voice production relies on three systems working together: a power source (your lungs and breathing muscles), a vibrator (your vocal folds), and a set of resonators and shapers (your throat, mouth, and nasal passages). Air pressure from below sets the vocal folds vibrating, those vibrations create a raw buzzing sound, and then everything above the vocal folds molds that buzz into recognizable vowels, consonants, and the unique tone that makes your voice yours.

How Breathing Powers the Voice

Before you make any sound, your diaphragm and abdominal muscles push air upward from your lungs. This creates pressure beneath your vocal folds, called subglottal pressure, and it is the single biggest factor controlling how loud your voice is. The harder your breathing muscles push, the more air pressure builds below the vocal folds, and the louder the resulting sound. That same increase in pressure also raises your pitch slightly and introduces more breathiness or noise into the tone.

This is why singing teachers and speech therapists focus so heavily on breath support. Without steady, controlled air pressure, the vocal folds can’t vibrate consistently, and your voice sounds weak, shaky, or strained.

What the Vocal Folds Actually Are

Your vocal folds (sometimes called vocal cords) sit inside the larynx, the structure you can feel at the front of your throat. The larynx is built from several pieces of cartilage. The largest is the thyroid cartilage, a shield-shaped structure that forms the bump known as the Adam’s apple. Below it sits the cricoid cartilage, a ring that forms the base of the larynx. At the back, two small arytenoid cartilages anchor the rear ends of the vocal folds and pivot to open or close them.

The vocal folds themselves are made of muscle (the thyroarytenoid muscle) covered by a thin, flexible lining called mucosa. This layered structure is critical. The soft, pliable mucosa can ripple and wave independently of the stiffer muscle beneath it, and that rippling motion is what actually produces sound. If the folds were rigid, they would open and slam shut like a door. Instead, they undulate in a wave-like pattern that efficiently converts airflow into acoustic energy.

How Vibration Creates Sound

When you decide to speak, your brain sends signals through the vagus nerve (the tenth cranial nerve) to the muscles of the larynx. One branch of this nerve, the recurrent laryngeal nerve, controls nearly all the intrinsic muscles that move the vocal folds. These muscles pull the arytenoid cartilages together, bringing the vocal folds to the midline so they touch.

Once the folds are closed, air pressure from below builds until it is strong enough to push them apart. Air rushes through the gap, and as it does, a drop in pressure between the folds (related to the Bernoulli effect) combined with the natural elasticity of the tissue pulls them back together. Pressure builds again, pushes them apart again, and the cycle repeats. This open-close-open-close cycle happens extraordinarily fast. In a typical conversation, a man’s vocal folds vibrate around 115 times per second, while a woman’s average about 200 times per second. Men’s folds can range from about 90 to 500 cycles per second, and women’s from about 150 to 1,000. Children and the highest sopranos can push past 1,000 cycles per second, with extreme vibrations approaching 2,000 per second.

Each cycle of opening and closing releases a tiny puff of air. String hundreds of these puffs together every second and you get a buzzing sound wave. The speed of vibration determines the pitch: faster vibration means a higher pitch, slower vibration means a lower one. You control this primarily by tensing or relaxing the vocal fold muscles, which changes their stiffness and effective length, much like tightening a guitar string raises its pitch.

How Your Throat and Mouth Shape the Sound

The raw buzz produced by the vocal folds doesn’t sound like speech yet. It contains a wide spread of frequencies, like white noise with a pitch. Turning it into your recognizable voice is the job of the vocal tract: the open airway stretching from just above the vocal folds, through the throat (pharynx), into the mouth, and optionally through the nasal passages.

The vocal tract works as a filter. It has natural resonant frequencies determined by its shape and size, and sound energy near those frequencies gets amplified while energy at other frequencies gets dampened. The result is a set of broad peaks in the sound spectrum. These peaks are what distinguish one vowel from another. When you say “ee” versus “ah,” you are not changing what your vocal folds do. You are reshaping your vocal tract by moving your tongue, jaw, and lips, which shifts the resonant frequencies and changes which parts of the sound get boosted.

Children learn to control these resonances as they learn to speak, adjusting tongue height, jaw opening, lip rounding, and the position of the soft palate to hit specific frequency targets for each vowel sound. Nasal sounds like “m” and “n” are produced by lowering the soft palate to couple the nasal cavity to the oral cavity, adding extra resonances that give those sounds their distinctive quality.

Some parts of the vocal tract are harder to change. The hypopharyngeal cavity, the space just above the larynx, has a relatively fixed shape that varies from person to person. This fixed geometry is part of why every voice sounds different even when two people say the same word at the same pitch.

How Articulators Turn Sound Into Speech

Resonance gives you vowels and voice quality, but clear speech also requires consonants, and those come from precise movements of structures called articulators. Articulators fall into two categories: active ones that move (your tongue, lips, lower jaw, and soft palate) and passive ones that stay in place (your teeth, the hard palate, and the ridge behind your upper teeth).

Different consonants are made by different types of closure between these articulators:

  • Stops like “p,” “t,” and “k” involve a complete closure that briefly blocks all airflow, then releases it in a small burst.
  • Fricatives like “f,” “s,” and “sh” are made by narrowing the gap between two articulators just enough to make the airflow turbulent, creating a hissing or buzzing noise.
  • Approximants like “w” and “r” narrow the gap as well, but not enough to cause turbulence. The airflow passes through smoothly, producing a softer, vowel-like sound.

Your tongue does most of the heavy lifting. It can touch the teeth, the ridge behind them, the hard palate, or curl back toward the soft palate, all within fractions of a second. The speed and precision required is remarkable: during normal conversation, your articulators can shift positions to produce 10 to 15 distinct sounds per second, each demanding a unique configuration.

What Controls It All

Coordinating over 100 muscles across the lungs, larynx, throat, and face requires significant brain power. The motor cortex sends commands through several cranial nerves. The vagus nerve handles the larynx, while other cranial nerves control the tongue, lips, jaw, and soft palate. A specialized region called the laryngeal motor cortex is essential for voluntary voice production, linking the intent to speak with the muscle coordination needed to make it happen.

This neural control is what separates human speech from simpler vocalizations. Many animals have vocal folds and can produce sound, but the fine motor control humans have over pitch, volume, and articulation is what makes language possible. It is also why damage to specific nerves, particularly the recurrent laryngeal nerve, can cause immediate and noticeable voice problems like hoarseness or a breathy tone, even when the vocal folds themselves are physically intact.