Sample size depends on four core parameters: your acceptable error rate, the statistical power you need, the size of the effect you’re trying to detect, and how much variability exists in your population. Getting these right before collecting data is what separates a study that can actually answer its research question from one that wastes time and resources. The process differs for quantitative studies, qualitative studies, and surveys, but the underlying logic is the same: recruit enough participants to confidently detect a real result, and not so many that you burn through your budget proving something trivial.
The Four Parameters Behind Every Calculation
Every quantitative sample size calculation revolves around the same set of inputs, whether you’re running a clinical trial or a classroom experiment.
Significance level (alpha) is the probability of concluding there’s a difference when none actually exists. Most research sets this at 0.05, meaning you accept a 5% chance of a false positive. This is your Type I error rate. Stricter fields or studies with serious consequences sometimes use 0.01.
Statistical power is the probability of detecting a real effect when one exists. The standard minimum is 80%, which means you accept a 20% chance of missing a true difference (a Type II error, also called beta). Some well-funded trials aim for 90% power, which requires larger samples.
Effect size is the smallest difference between groups that you consider meaningful. Jacob Cohen’s widely used benchmarks classify effect sizes as small (0.2), medium (0.5), and large (0.8 or above). As Cohen put it, a medium effect “is visible to the naked eye of a careful observer,” while a small effect is noticeably smaller but not trivial. The smaller the effect you want to detect, the more participants you need. This is often the hardest parameter to set because it requires you to decide in advance what difference would actually matter in your field.
Population variability is how spread out the measurements in your population are. Higher variability means you need more participants to distinguish a real signal from random noise. If you’re studying something with a lot of natural variation, like blood pressure across age groups, your sample needs to be larger than if you’re measuring something more uniform.
Sample Size for Surveys
Survey research uses a slightly different framework centered on confidence level, margin of error, and an estimate of the population proportion. The Cochran formula is the standard starting point for large populations. It calculates the minimum sample size using a Z-score (tied to your confidence level), your estimated population proportion, and your desired margin of error. If you don’t know the population proportion, a common default is 0.5, which produces the most conservative (largest) sample size estimate.
For smaller, known populations, the Yamane formula offers a simpler alternative: divide the population size (N) by 1 plus N multiplied by the square of your desired precision level. This assumes a 95% confidence level and a population proportion of 0.5. So for a population of 1,000 with a 5% margin of error, you’d need roughly 286 respondents.
The relationship between these inputs is intuitive once you see it in action. Tightening your margin of error from 5% to 3% dramatically increases the required sample. Moving from 95% to 99% confidence does the same. This is why large national polls typically survey 1,000 to 1,500 people for a 3% margin of error at 95% confidence, while a quick internal survey at a company might get by with far fewer.
Sample Size for Experimental and Clinical Studies
In experimental research, the calculation starts with the four core parameters described above. You choose your alpha (usually 0.05), your power (usually 80%), the minimum effect size you care about, and an estimate of variability from prior research or a pilot study. These inputs feed into formulas specific to your planned statistical test, whether that’s a t-test comparing two groups, an ANOVA comparing several groups, or a correlation analysis.
Clinical trials follow the same logic but with some field-specific norms. Phase II trials, which test whether a treatment shows enough promise to justify larger testing, typically enroll 25 to 40 patients regardless of the specific design. Phase III trials, which aim to confirm effectiveness, vary enormously, from under a hundred to thousands of participants, depending on the endpoint being measured and the expected effect size. Sequential trial designs can reduce the required sample by allowing researchers to stop early if results become clear before all participants have been enrolled.
Sample Size in Qualitative Research
Qualitative research doesn’t use formulas. Instead, sample size is guided by the concept of data saturation: the point at which new interviews, observations, or focus groups stop producing new information. Saturation has become the gold standard for determining purposive sample sizes in health science research and related fields.
There are several ways to think about saturation. Theoretical saturation, rooted in grounded theory, means you stop sampling when your theoretical categories are fully developed. Inductive thematic saturation means no new codes or themes are emerging from analysis. Data saturation, the simplest version, means new participants are repeating what earlier ones already said. In practice, researchers often recognize saturation when they can talk about the data in generalized terms and readily supply examples for each theme without hesitation.
Because you can’t know in advance exactly when saturation will occur, qualitative researchers typically plan a range. For most interview-based studies, somewhere between 15 and 30 participants is common, though highly focused studies on a homogeneous group may reach saturation sooner, and complex multi-site studies may require more.
Pilot Studies and Rules of Thumb
Pilot studies serve a different purpose than full-scale research. They test whether your study design, recruitment strategy, and measurement tools actually work. The sample size for a pilot doesn’t need to provide statistical power for hypothesis testing. It needs to be large enough to reveal practical problems. Recommendations vary: some guidelines suggest at least 30 participants per group, while others consider 12 per group sufficient. The right number depends on the complexity of your protocol and how much uncertainty you have about feasibility.
Adjusting for Dropout and Non-Response
Your calculated sample size assumes everyone you recruit completes the study. In reality, people drop out, skip survey questions, or become unreachable. If you don’t account for this, you’ll end up underpowered.
The standard fix is straightforward. If you expect an attrition rate of, say, 20%, you inflate your initial sample using a multiplication factor. The intuitive version of this: divide your required sample by (1 minus the expected attrition rate). So if your formula says you need 200 participants and you expect 20% dropout, recruit 200 divided by 0.80, which is 250. The higher the expected attrition, the more you need to over-recruit. Longitudinal studies, which follow people over months or years, typically face higher attrition than a single-session experiment and need proportionally larger starting samples.
Software Tools for Calculation
You don’t need to run these formulas by hand. G*Power is the most widely used free tool for sample size and power calculations in academic research. It supports t-tests, F-tests, chi-square tests, Z-tests, and exact tests, covering the vast majority of common research designs. You can use it to calculate the sample size needed for a planned study, determine the power of a study you’ve already run, or figure out the minimum effect size your sample could detect.
G*Power works well for standard designs. For more complex scenarios like multilevel models, cluster randomized trials, or survival analysis, researchers typically turn to specialized software or statistical programming in R or Stata. Many universities also offer online sample size calculators tailored to specific study types, which can be useful for quick estimates before committing to a full power analysis.
Common Mistakes That Lead to Wrong Estimates
The most frequent error is choosing an effect size based on what would give a convenient sample size rather than what represents a genuinely meaningful difference. If you set your expected effect size too large because you want a smaller sample, you risk running an underpowered study that can’t detect the real (likely smaller) effect.
Another common problem is ignoring the variability in your population. Borrowing a standard deviation from a published study that used a different demographic or measurement tool can throw off your calculation significantly. Using data from your own pilot study or from a closely matched prior study gives a much more reliable estimate.
Finally, researchers sometimes calculate sample size for a simple comparison but then plan a more complex analysis, like regression with multiple predictors. More complex analyses generally require larger samples. Your power calculation should match the actual statistical test you intend to run, not a simpler version of it.