Examining Your BLAST Results (2024)

Examining Your BLAST Results

The easiest way to makea quick species-level determination is through a BLAST search. This search compares the sequences in the public NCBI database called GenBank to the sequence you have received for your specimen. It returns a list of the sequences that are most similar to your specimen.

Review the BLAST page for each of your records by clicking the BLAST "B" on each record within your MycoMap project.

Clicking this icon will open the results for that record in a new tab. In the center of your dashboard, there will be a large blue "B" next to each record that contains a sequence. Click this "B" and a page showing the closes sequence matches will be displayed in a new tab.

On a BLAST page for an individual record, there will be the results from two separate databases. The top set of resultsis the closest matches from NCBI's GenBank. The bottom set of resultsis the closest matches from NAMP projects - others and yours - that are not yet in GenBank.

How to Interpret BLAST Results

There is not auniversally applicable quantitativemethod to make this initial assessment for all fungal species, but a goodguide involves using four components of the reference sequences in BLAST results:

high Identity value, in the ‘identity’ column;
high "Query Cover" value, in the ‘Query’ column;
the number of reference sequences being matched this high in the results; and
that the species name comes from a trusted source.

As a general rule-of-thumb for species-level determination of many groups of fungi, two specimens with a sequence similarity of 97% or above are considered to bea single species, it is not a hard and fast rule. Keep in mind, however, that this number is highly variable between different groups. For some groups, two specimens may have 99% sequence similarity, but they may be from two different, related species.

For the purposes of the initial triage, a high Identity value generally falls in the range of 98-100% sequence similarity. The identity reports on the percentage of base pairs that are the same between the sequence of your specimen and that of the reference specimen. If 99 out of 100 base pairs match, then you have a 99% identity value in your results. If there are a large number of reference sequences that fall into the 98-100% range in your results, all with the same species name you believed it your specimen to be, then you would likely identify your specimen as that species, and not need to review that sequence much more in the future. If more than one reference species have 98-100% sequence similarity with your specimen, you would identify your specimen conclusively at the genus level, but not at a species level.

Query cover is the percentage of the query sequence(your specimen)that overlaps the reference sequence. BLAST results do not typically attempt to match the full length of a sequence. A high Query Cover value for theinitial triage is in the 70%+ range. If the top results fall below this range, it would generally be a good idea to review the sequence more in the future, and not verify it as a part of your initial triage. Often, the results only report on a single segment of a sequence that most closely aligns. It is possible to have a 99% identity match, but only across 35% of your sequence (query cover). In this case you would have no information on how closely the other 65% of your sequence matches up. This means reviewing the query coverpercentage of a BLAST search is just as important as the identity percentage, especially when performing an initial triage.

You can see this visually from your MycoMap BLAST results. NCBI stores the data for 3 days from the time the BLAST is initiated. Click the following button:

It will take you to the NCBI BLAST results page. The red lines are a visual representation of the query cover your results are based on:

The final thing to keep in mind is that the species names associated with reference sequences in your search may not be accurate. It all comes down to how much you trust the identifier who originally proposed the name for the reference sequence. If the sequence is from a type specimen or from a paper that reviews the group, that certainly carries a lot of weight. Most reference sequences, however, will have little supporting provenance on how the species determination was made. We will discuss this more in the future, but always keep this in mind, especially when you are new to analyzing sequence data. You can view the information on the identifier of the reference sequence by clicking on the corresponding link to the reference sequence in the ‘Accession’ column.

You are inspecting the results to see if the genus and species at the top of each BLAST is the same as the name you applied to the specimen. With some of your specimens, it will be very easy to make an accurate species determination.Some records will be much more difficult. This page discusses how to approach a "first look" at the BLAST results for your specimen.

Reviewing your Results

While looking through your BLAST results for the first time, note if the specimen has anyreference sequences for species – from either the GenBank or Mycoflora BLAST - that have a high identity value (98-100%) to your new sequence. If these reference sequencesuse the same species name you believedyour specimen to be, then it’s possible that no additional analysis will not be necessary for this to identify your specimen, and you can utilize this name for your specimen.Always be sure tolook a bit further down in the BLAST results for the species name as well. If there are sequences going under the same name with lower levels of sequence similarity (>97%), more analysis will be necessary in the future.

I have a great match with just one species. What now?

Each species record on of your dashboard has a checkmark icon on (highlighted in red below). Clicking this checkmark turns it green. A green checkmark represents a record that has been confidently identified at the species level and "verified," and means that the project owner believes the species name in use for the record accurately reflects the best name available for the record. Checking the mark to green will flag that record for inclusion in the resulting flora for the project, as well as for the resulting comprehensive flora for North America.

Records in a project can be filtered by whether or not the display includes these assessed (verified) records.

The ultimate goal of a project owner is to verify all of the records in their projects with the "correct" species name. In you are not absolutely sure of the species name for a record, consider changing the "name-in-use" of the source record (MO or iNat) to the genus, rather than a species name you are unsure of.

Missing Sequences and/or BLAST Results

When you review your specimen records, you may find that BLAST results and sequences are not available for some specimens. Sequences may not be available if we were either:

unable to isolate DNA from a specimen, or
the specimen was contaminated with another fungus and we were unable to obtain a sequence.

Ensure your results do not represent a contaminant.

In a small number of cases, the sequence results may represent some kind of contaminant, such as another type of fungus like a yeast or a mold. This is an unavoidable aspect of the process, and is much more common in certain groups, such as jelly fungi. Theremay be a couple of reasons for this:

The specimen may have been contaminated with another fungus in situ, i.e. where it was collected. An example may be when more than one fungus is growing on a log. Mycelium of one species may have grown into the specimen you collected.
It is also much more common for specimens that were not immediately placed in the dryer or were otherwise dried improperly.

Ifyou receive a sequence of a contaminant back, there is little we can do except attempt another DNA extraction from the original material. This would be an additional fee and there is no guarantee the next attempt would be successful. Typically, most researchers do not attempt this unless the specimen has some kind of unusual importance.

If you do believe the sequence to be from a contaminant, please complete the following steps:

Clickk the sequence "S" box on the line with the record you believe represents a contaminated sequence. Underneath the sequence record, you will see a gray checkbox that says "This sequence is a contaminant." Clicking the gray checkbox turns it green. Selecting this box removes the NCBI GenBank verification icon from the record, so the contaminanted sequence is not accidentally uploaded to GenBank.

We do savethese sequences in the database. The may serve some type of unusual ecological information in the future.

Sequence doesn’t match the genus of your specimen

Occasionally, specimens become improperly numbered in the field or mixed up in the lab. The final sanity check for sequences is that the genus that most commonly appears in the BLAST results is the same genus that you were expecting the sequence to be. If your BLAST results show the sequence represents a Tricholoma, but the specimen is attached to an observation of Polyporus, there is obviously a problem.

The most typical issue is that two tubes were switched at some point in the process. Once you are done going through your specimens, it is usually possible to find the tube that was switched out. If you are able to find the cause of the error, the simplest fix is to edit the sequence records to be associated with the correct observational report.

If you are unable to find the cause of the error, please select the "This sequence is the wrong specimen" checkbox and alert us to the issue by emailing steve@mycomap.com. Wewill attempt to track downthe source of the error.

CLICK HERE for a full primer on how to properly interpret BLAST results for fungi.

FAQs

How to interpret BLAST results? ›

The higher the score, the better the alignment. When max and total scores are the same, there is one global alignment between the query and its match in the database. This means that the sequences can be aligned without long insertions or deletions.

Read On ›

What is a good identity score BLAST? ›

For the purposes of the initial triage, a high Identity value generally falls in the range of 98-100% sequence similarity.

Discover More Details ›

How do you get more than 100 results in BLAST? ›

All Answers (6)

Go to NCBI Blast page,
Bottom of this page go to the option +Algorythm Parameter and modify the default value (100) to According to your requirement(1000, 2000, 5000)
Select the value.
Got the Maximum hits which you select in the option.

Jan 18, 2022

What is a good value for BLAST? ›

Blast results are sorted by E-value by default (best hit in first line). The smaller the E-value, the better the match. Blast hits with an E-value smaller than 1e ^-⁵⁰ includes database matches of very high quality. Blast hits with E-value smaller than 0.01 can still be considered as good hit for hom*ology matches.

See Details ›

What does total score in BLAST mean? ›

Max score = highest alignment score（bit-score）between the query sequence and the database sequence segment. Total score = sum of alignment scores of all segments from the same database sequence that match the quary sequence（calculated over all segments）.

Find Out More ›

What is a good expect value? ›

If the expect value is very much less than one, then the alignment score is not due to chance. However, if the expect value is near one or a greater number, it means the score may be due to chance, but doesn't mean it is.

Tell Me More ›

How do you calculate identity score? ›

To calculate an identity score, we need to know the length of the alignment and the number of matches—identical nucleotides—between two sequences. Each mutation type affects these two numbers in a unique way.

Show Me More ›

What is the scoring matrix in BLAST? ›

The default scoring matrix is BLOSUM62, shown below. The BLOSUM series uses observed substitution frequencies in ungapped alignment blocks of related proteins. BLOSUM62 includes information up to 62% identity. Experiment have shown that this is the best general scoring system.

Explore More ›

How is the max score calculated in BLAST? ›

Max[imum] Score: the highest alignment score calculated from the sum of the rewards for matched nucleotides or amino acids and penalities for mismatches and gaps.

How is BLAST calculated? ›

Per NCBI's definition page, the raw score of BLAST is the score of an alignment, calculated as the sum of substitution and gap scores.

Show Me More ›

What is different max score and total score in BLAST? ›

In some cases, the alignment may not extend along the entire length of the protein or there may be gaps between aligned regions of the sequences. “Max score” is the bit score for the aligned region with the highest score. “Total score” adds the bit scores for all aligned regions.

Read The Full Story ›

What does an e value of 0 mean in BLAST? ›

An e-value of 0.0 means zero sequences can/are expected to match as well or better; the closer the e-value is to zero, the more significant (and less of a potential false positive) the match is considered to be.

See Details ›

What does a high E-value mean in BLAST? ›

Lower (i.e., stronger) E-values indicate more significant alignments, suggesting a higher probability that the sequences share a common evolutionary origin. A higher (i.e., weaker) E-value indicates that the alignment might be a random event.

Get More Info Here ›

What does a 0.0 E-value mean in BLAST? ›

What is the e-value and p value in BLAST? ›

E-value (Expectation value): correction of the p-value for multiple testing. In the context of database searches, the E-value (associated to a score S) is the number of distinct alignments, with a score equivalent to or better than S, that are expected to occur in a database search by chance.