Protein BLAST, or Basic Local Alignment Search Tool, is a foundational tool in bioinformatics that functions like a search engine for protein sequences. A researcher can use Protein BLAST to compare an amino acid sequence against a vast library of known sequences to identify regions of similarity. These similarities can provide clues about a protein’s function or evolutionary history.
The Purpose of a Protein BLAST Search
The primary goal of a Protein BLAST search is to identify similarities between sequences to understand a protein’s function and evolutionary background. When a scientist has a protein with an unknown function, they can use its amino acid sequence as a “query sequence” to search a “database” of cataloged proteins. This comparison allows researchers to find known proteins that share a similar sequence.
This concept of sequence similarity is connected to the biological principle of homology, which suggests shared ancestry. If two protein sequences from different organisms are highly similar, they likely descended from a common ancestral gene. This evolutionary link often implies that the proteins perform similar jobs within their respective organisms. For instance, finding a close match between a newly discovered human protein and a well-studied yeast protein can provide strong hints about the human protein’s function.
How the BLAST Algorithm Works
The BLAST algorithm efficiently finds regions of local similarity between protein sequences through a multi-step process designed to be much faster than more exhaustive methods. The procedure begins with “seeding,” where the algorithm breaks the query protein sequence into small pieces called “words,” which are three amino acids long. It then searches the database for exact or near-exact matches to these short words.
Once these initial “seed” matches are found, the algorithm moves to the “extending” phase. BLAST expands the alignment outwards from the seed match in both directions along the protein sequence. This extension continues as long as the quality of the alignment improves. The process creates high-scoring segment pairs (HSPs), which are local alignments between the query sequence and a database sequence.
The final step is “evaluating” the quality of these alignments using a scoring system, such as the BLOSUM62 substitution matrix. This matrix assigns a score for every possible amino acid pair. Matches between identical or chemically similar amino acids receive positive scores, while mismatches between dissimilar amino acids receive negative scores.
Decoding the Results Page
After a BLAST search is complete, the results page presents several metrics to help interpret the findings. A graphical overview displays the alignments as colored bars, giving a quick visual summary of the matches. Below this, a table lists the matched sequences, or “hits,” ordered by their statistical significance.
One of the first metrics is the Percent Identity, which shows the percentage of amino acids that are identical between your query sequence and the matched sequence. Another number is the Score, or bit-score, which represents the overall quality of the alignment; higher scores indicate a better match. This score is calculated based on the number of identical and similar amino acids, as well as any gaps introduced into the alignment.
The most informative metric is the Expect Value, or E-value. The E-value represents the number of alignments with a similar score that you would expect to find by chance when searching a database of a particular size. A very low E-value, such as 1e-50, suggests the match is highly significant and not a random occurrence, while a high E-value indicates the match could be due to chance.
Scientific Uses of Protein BLAST
A significant application of BLAST is in genome annotation. As scientists sequence the entire genome of an organism, they are left with vast amounts of DNA data. BLAST can scan this data to find sequences that are likely to code for proteins by comparing translated DNA sequences against protein databases, which helps create a map of all the genes within that genome.
BLAST is also a tool for studying evolutionary relationships. By comparing the sequence of a single protein across numerous species, from bacteria to mammals, scientists can construct a phylogenetic tree. This diagram illustrates how different species are related based on the similarity of their proteins. This offers insights into the evolutionary pathways that have led to the diversity of life seen today.