How to Interpret Fisher’s Exact Test Results

Fisher’s exact test tells you whether two categorical variables are associated by calculating the exact probability of seeing your data (or something more extreme) if there were truly no relationship between them. That probability is the p-value. If it’s small, typically below 0.05, the association in your data is unlikely to be a coincidence. If it’s large, you don’t have enough evidence to claim the two variables are related.

The test is built for 2×2 contingency tables, the kind where you’ve sorted subjects into four cells based on two yes-or-no factors. Unlike the chi-square test, which estimates probabilities using an approximation, Fisher’s test computes them directly. That makes it the go-to choice when your sample is small or your cell counts are low.

When to Use Fisher’s Test Instead of Chi-Square

The chi-square test works well with larger samples, but its approximation breaks down when cell counts get too small. The standard rule: if more than 20% of your expected cell frequencies are below 5, or any single expected frequency is below 1, switch to Fisher’s exact test. “Expected” here doesn’t mean the numbers you observed. It means the counts you’d predict in each cell if the two variables were completely independent of each other.

In practice, Fisher’s test comes up often in medical research involving rare outcomes, pilot studies with small groups, or any situation where you simply don’t have enough data for the chi-square approximation to be reliable. With modern software, many researchers run Fisher’s test even with larger samples since there’s no penalty for using the exact calculation when computing power isn’t a constraint.

Setting Up Your 2×2 Table

Before you can interpret results, your data needs to be structured correctly. Each subject goes into exactly one of four cells based on two binary factors. For example, if you’re testing whether a treatment is associated with recovery, your rows might be “treatment” and “control,” and your columns might be “recovered” and “not recovered.” Every cell contains a raw count of subjects: whole numbers only, not percentages, proportions, or averages.

The test treats the row totals and column totals (called marginal totals) as fixed. In other words, it asks: given these margins, how likely is this particular arrangement of counts across the four cells? That assumption is what makes the exact probability calculation possible.

Reading the P-Value

The p-value from Fisher’s exact test is the probability of observing an association as strong as (or stronger than) the one in your data, assuming the null hypothesis is true. The null hypothesis here is simple: the two variables are independent, meaning there’s no real association between them.

A p-value close to 0 means your observed pattern would be very unusual if the variables were truly unrelated. A p-value close to 1 means the data looks perfectly normal under that assumption. The conventional cutoff is 0.05, but that threshold is a convention, not a law of nature. A p-value of 0.04 is not fundamentally different from 0.06. What matters more is that smaller p-values represent stronger evidence against the null hypothesis: p = 0.02 is stronger evidence than p = 0.04, and p = 0.001 is stronger still.

One critical point: a non-significant p-value doesn’t prove the two variables are independent. It only means your data didn’t provide enough evidence to rule independence out. With a very small sample, even a real association might not produce a significant result because the test simply lacks the statistical power to detect it.

Two-Tailed vs. One-Tailed Results

Most software will give you both a one-tailed and a two-tailed p-value. You should almost always use the two-tailed version. A two-tailed test checks for an association in either direction. For instance, it asks whether a treatment could be better or worse than a control, not just one of those.

A one-tailed test is only appropriate when you decided before collecting data that you’d only care about an effect in one specific direction, and you’d genuinely treat an effect in the opposite direction the same as no effect at all. This is rare. Choosing a one-tailed test simply because your two-tailed result wasn’t significant is not valid, no matter how close the p-value was to your threshold. The one-tailed version produces a smaller p-value (roughly half the two-tailed value), so switching to it after seeing your results inflates your false positive rate.

What the Odds Ratio Tells You

The p-value tells you whether an association exists, but it doesn’t tell you how strong or practically meaningful that association is. That’s where the odds ratio comes in. Most software reports it alongside the p-value when running Fisher’s test.

An odds ratio of 1 means the odds of the outcome are the same in both groups, meaning no association. An odds ratio above 1 means the outcome is more likely in one group, and below 1 means it’s less likely. The further the number is from 1, the stronger the association. For example, in a study comparing a treatment to a control, an odds ratio of 19 would mean the odds of the outcome in the treatment group are roughly 19 times the odds in the control group.

The 95% confidence interval around the odds ratio is just as important as the number itself. If that interval includes 1, you can’t confidently say the groups differ, which aligns with a non-significant p-value. If the interval spans from, say, 3.8 to 136, you can be fairly sure there’s a real effect, but the wide range tells you your estimate is imprecise, usually because of a small sample. The confidence interval is typically asymmetric around the odds ratio, which is normal for this kind of calculation.

A Worked Example

Suppose you’re testing whether a new supplement is associated with symptom improvement. You give it to 20 people and a placebo to 28 others. In the supplement group, 17 improve and 3 don’t. In the placebo group, 6 improve and 22 don’t. Your 2×2 table looks like this:

Supplement, improved: 17
Supplement, not improved: 3
Placebo, improved: 6
Placebo, not improved: 22

Running Fisher’s exact test on this table yields a two-tailed p-value of about 0.00002. That’s far below 0.05, so you’d conclude there is a statistically significant association between taking the supplement and improving. The odds ratio comes out to roughly 19, meaning the odds of improvement were about 19 times higher in the supplement group. The 95% confidence interval runs from about 3.8 to 136, confirming the association is real but indicating the exact magnitude is uncertain given the sample size.

Common Mistakes in Interpretation

The most frequent error is treating p = 0.05 as a bright line where everything below it is “true” and everything above it is “false.” Statistical significance is a continuum. Report the actual p-value and let readers assess the strength of evidence for themselves.

Another common mistake is confusing statistical significance with practical importance. A tiny p-value means the association is unlikely to be due to chance. It doesn’t mean the effect is large enough to matter in the real world. Always look at the odds ratio and its confidence interval to gauge the size and precision of the effect.

Finally, Fisher’s exact test only detects association, not causation. Finding that two categorical variables are related in a contingency table doesn’t mean one causes the other. That conclusion requires a controlled experimental design, not just a statistical test.