What is PICRUSt2 and How Does It Predict Microbial Function?
Learn how PICRUSt2 uses evolutionary relationships to infer a microbial community's functional potential from common marker gene sequencing data.
Learn how PICRUSt2 uses evolutionary relationships to infer a microbial community's functional potential from common marker gene sequencing data.
PICRUSt2, or Phylogenetic Investigation of Communities by Reconstruction of Unobserved States 2, is a computational bioinformatics tool. Its primary function is to predict the functional capabilities of a microbial community using marker gene data, most commonly from 16S rRNA gene sequencing. This software provides insight into the genes and metabolic pathways that are likely present in a given environment, helping form a picture of what microorganisms in a sample might be doing.
Microbial analysis through 16S rRNA gene sequencing can identify which organisms are present in a sample, a process known as taxonomic profiling. While this tells us “who is there,” it does not explain what these microbes are capable of doing. Functional profiling aims to uncover the collective genetic capabilities of the community, revealing the metabolic pathways and other functions they possess.
Understanding these potential functions is valuable for forming hypotheses about the roles microbes play across diverse environments. For instance, it can suggest how microbes in the human gut contribute to health or disease, or how soil microbes participate in nutrient cycling.
Directly assessing function through methods like shotgun metagenomics, which sequences all DNA in a sample, can be expensive and computationally demanding. Amplicon sequencing of the 16S gene is a more cost-effective alternative. PICRUSt2 serves as a bridge, enabling researchers to derive functional predictions from this more accessible 16S data.
The core process of PICRUSt2 involves connecting a sample’s 16S rRNA gene sequences to a database of reference genomes with known functions. The primary input for the software is a table of Amplicon Sequence Variants (ASVs), which are unique gene sequences identified from the sample, along with their abundances. These ASVs represent the different types of microbes present.
The prediction process begins with phylogenetic placement, where each ASV from the sample is inserted into a large reference phylogenetic tree. This tree illustrates the evolutionary relationships between the ASVs and organisms with fully sequenced genomes. Once placed, the software uses ancestral state reconstruction to infer the likely gene content of the sample’s ASVs based on the known genes of their closest relatives.
From this, PICRUSt2 predicts the abundance of various gene families and metabolic pathways for the entire community. A key metric generated is the Nearest Sequenced Taxon Index (NSTI). The NSTI score for each ASV indicates how closely it is related to a genome in the reference database, serving as a confidence measure for the functional predictions. A lower NSTI score suggests a closer match and a more reliable prediction.
As an updated version of the original PICRUSt, the tool has several improvements. It utilizes a reference database that is more than ten times larger, incorporating a greater diversity of microbial genomes. This expansion helps improve the accuracy of predictions across a wider range of environments.
In human microbiome research, PICRUSt2 is used to explore the functional shifts in gut microbial communities associated with health and disease. For example, studies on inflammatory bowel disease (IBD) or obesity may use it to predict changes in metabolic pathways. These predictions help generate hypotheses about how microbial functions contribute to disease or respond to interventions.
Environmental microbiology also benefits from this predictive power. Researchers studying soil ecosystems can use the tool to infer the roles of microbial communities in nutrient cycling, such as nitrogen fixation or carbon metabolism. In aquatic environments, it can help identify the potential for microbes to degrade pollutants.
The tool is also applied in animal microbiome studies to understand host-microbe interactions. In agriculture, it can help investigate how the gut microbes of livestock influence their growth and health. In wildlife studies, it can offer clues into how animal microbiomes adapt to different diets and environments.
By generating functional hypotheses, PICRUSt2 guides further research. The predictions can point scientists toward specific pathways or genes to investigate with more targeted experimental methods, helping to prioritize research efforts when sequencing the entire metagenome is not feasible.
PICRUSt2 predicts functional potential, not expressed function. It identifies the genes present within a community, suggesting what microbes are capable of, but not what they are actively doing. Gene expression is highly variable and influenced by environmental factors not captured by DNA sequencing alone.
The accuracy of the predictions depends heavily on the reference genome database. If microbes in a sample are poorly represented in the database, the predictions may be less accurate. This can introduce bias when studying novel environments. The NSTI scores help assess this, as higher scores indicate greater phylogenetic distance from reference genomes and thus less certain predictions.
Analysis involves comparing the predicted abundance of gene families or metabolic pathways between different sample groups. For instance, a study might compare the functional profiles of a healthy group to a disease group to identify pathways that differ significantly. These differences then become the basis for new hypotheses.
The findings should be interpreted with caution, as the results are exploratory. Conclusions drawn from these predictions should be presented as potential, rather than confirmed, functions. Where possible, validating these predictions with other methods, such as metatranscriptomics or metabolomics, can provide stronger evidence for a community’s functional roles.