How to Find Sample Size With Standard Deviation

To find sample size when you know (or can estimate) the standard deviation, you use this formula: n = (Z × σ / E)². Here, n is the sample size you need, Z is a value tied to your chosen confidence level, σ is the population standard deviation, and E is your acceptable margin of error. The formula tells you how many observations you need to estimate a population mean with a specific level of precision.

Each piece of that formula plays a distinct role, and changing any one of them shifts the result significantly. Let’s break down exactly how to use it, step by step.

What Each Variable Means

Z (the critical value) reflects how confident you want to be that your result captures the true population value. Higher confidence demands a larger sample. The three most common confidence levels and their Z values are:

  • 90% confidence: Z = 1.645
  • 95% confidence: Z = 1.96
  • 99% confidence: Z = 2.575

Most research uses 95% confidence, so Z = 1.96 is the default unless you have a reason to choose otherwise.

σ (standard deviation) is your estimate of how spread out the data is in the population you’re studying. A larger spread means more variability, which means you need more observations to pin down the true average. This value must be in the same units as your margin of error.

E (margin of error) is how close you want your sample estimate to be to the real population value. If you’re measuring blood glucose in mg/dL, and you can tolerate being off by up to 5 mg/dL, then E = 5. A smaller margin of error produces a more precise estimate but requires a larger sample.

A Full Worked Example

Say you want to estimate the average random blood glucose of patients before surgery. From preliminary data, you know the standard deviation is about 32 mg/dL. You want your estimate to be within 5 mg/dL of the true mean, and you want 95% confidence.

Plug into the formula:

n = (Z × σ / E)²
n = (1.96 × 32 / 5)²
n = (62.72 / 5)²
n = (12.544)²
n = 157.35

Always round up, because you can’t collect a fraction of a data point. You need at least 158 subjects.

Notice what happens if you tighten the margin of error to 3 mg/dL instead of 5: n = (1.96 × 32 / 3)² = (20.91)² = 437. Cutting your acceptable error nearly in half almost tripled the required sample size. Precision is expensive.

How to Estimate Standard Deviation

The formula requires you to input a standard deviation before you collect your data, which creates an obvious chicken-and-egg problem. There are a few practical ways to get a reasonable estimate.

Use a pilot study. Collect a small preliminary sample (even 20 to 30 observations) and calculate the standard deviation from that data. This is the most reliable approach when no prior information exists.

Pull from previous research. If someone has studied a similar population measuring the same variable, use their reported standard deviation. Published studies almost always report means and standard deviations for their outcomes.

Use the range rule of thumb. If you can reasonably guess the highest and lowest values you’d expect in your population, divide that range by 4. For example, if you expect adult resting heart rates to fall between 50 and 100 beats per minute, a rough standard deviation estimate would be (100 – 50) / 4 = 12.5. This is a crude estimate, but it gives you a starting point when nothing else is available.

When in doubt, use a slightly larger standard deviation estimate. Overestimating variability gives you a larger (more conservative) sample size, which is better than underestimating and ending up with too little data.

How Confidence Level Changes the Result

Using the same blood glucose example (σ = 32, E = 5), here’s how the required sample size shifts across confidence levels:

  • 90% confidence (Z = 1.645): n = (1.645 × 32 / 5)² = 111
  • 95% confidence (Z = 1.96): n = (1.96 × 32 / 5)² = 158
  • 99% confidence (Z = 2.575): n = (2.575 × 32 / 5)² = 273

Going from 95% to 99% confidence adds over 100 participants. That tradeoff matters when data collection is time-consuming or costly.

Sample Size for Comparing Two Groups

The formula above is for estimating a single mean. If you’re comparing the means of two groups (a treatment group versus a control group, for instance), the calculation changes. The standard formula for a two-group comparison is:

n = (Zα/2 + Z1−β)² × 2σ² / d²

Here, d is the difference in means you expect (or hope) to detect between the two groups, σ is the pooled standard deviation, Zα/2 comes from your confidence level (1.96 for 95%), and Z1−β comes from your desired statistical power.

Power is the probability that your study will detect a real difference if one exists. The standard target is 80% power, which corresponds to Z1−β = 0.84. If you want 90% power, that value increases to 1.28, and you’ll need more participants.

This formula gives you the number of subjects per group, so multiply by two for the total sample size in a two-group study.

Why Effect Size Matters

In the two-group formula, the detectable difference (d) and the standard deviation combine to form what’s called the effect size. A commonly used version, Cohen’s d, is simply the difference between the two group means divided by the pooled standard deviation.

A Cohen’s d of 0.2 is considered a small effect, 0.5 is medium, and 0.8 is large. The practical consequence: small effects need far more participants to detect. If your expected effect size drops from large to small while power stays at 80%, the required sample size can increase by a factor of 15 or more. This is why studies looking for subtle differences (like a new drug that’s only slightly better than an existing one) need hundreds or thousands of participants.

For example, in a study comparing total cholesterol between two groups with means of 6.5 and 5.2 mmol/L and a pooled standard deviation of 0.67, the effect size is (6.5 − 5.2) / 0.67 = 1.94, a very large effect. Detecting that difference requires relatively few participants. But if the expected difference between groups were only 0.13 mmol/L instead of 1.3, the effect size would shrink to 0.19, and the sample size requirement would explode.

Common Mistakes to Avoid

The most frequent error is using mismatched units. Your standard deviation and margin of error must be in the same units. If the standard deviation is in milligrams per deciliter, the margin of error must be too. Mixing units produces nonsensical results.

Another common mistake is confusing the formula for estimating a mean with the formula for estimating a proportion. Proportion-based sample size calculations (for surveys about percentages, like “what fraction of people prefer Brand A”) don’t use standard deviation at all. They use an estimated proportion instead. If your outcome is a continuous measurement like weight, blood pressure, or test scores, you need the standard deviation formula. If your outcome is a yes/no or percentage, you need the proportion formula.

Finally, people sometimes forget to round up. A calculated sample size of 157.35 means you need 158 subjects, not 157. Rounding down gives you slightly less precision than you specified.