Back to Search

BLAST Algorithm Parameters

The Freesia Sequence Search tool implements a client-side version of the canonical BLAST (Basic Local Alignment Search Tool) algorithm. Understanding and tuning these parameters allows you to optimize the search for different biological use cases, balancing sensitivity against computational speed.

Word Size

Default: 7

The core of the BLAST heuristic. The algorithm first breaks your query sequence into smaller overlapping chunks called "words" (or k-mers) of this exact length. It then scans the database for exact matches to these words to serve as "seeds" for the full alignment.

Crucial rule: Your query sequence must be at least as long as the word size to produce any results, and there must be at least one perfect match of this length between the query and the subject.

Tuning Guide

  • Decrease (4–6): Increases sensitivity. Necessary for finding highly divergent sequences or when searching with very short queries (like 15–20 bp primers). Trade-off: Significantly slower search time and more noise.
  • Increase (11–16): Increases speed and specificity. Ideal when comparing large genomes or looking for nearly identical matches. Standard NCBI MegaBLAST uses a word size of 28.

Match / Mismatch Scores

Default: +2 / -3

These values define the reward for aligning identical bases and the penalty for aligning differing bases. The ratio between these two numbers mathematically dictates how tolerant the resulting alignment will be to mutations.

Common Scoring Schemes

  • +2 / -3 (Default): Standard for nucleotide BLAST. Optimized for finding alignments with ~85% to 95% identity.
  • +1 / -1: Very tolerant. Use this for cross-species comparisons or deep evolutionary searches where identity might drop to ~70%.
  • +1 / -3 or +1 / -4: Very strict. Penalizes mismatches heavily, resulting in shorter alignments containing almost exclusively identical matches. Useful for finding exact probe binding sites.

Gap Penalties (Open / Extend)

Default: -5 / -2

Freesia uses an affine gap penalty model. This biologically realistic model assumes that a single mutational event often inserts or deletes multiple bases at once. Therefore, starting a gap is penalized heavily, but making that gap longer is penalized less.

A gap of length k reduces the alignment score by: |Gap Open| + (k × |Gap Extend|)

Tuning Guide

  • Increase Open Penalty (-7 to -10): Forces the algorithm to prefer ungapped alignments. Use when searching within coding exons where frameshift indels are evolutionary penalized.
  • Decrease Open Penalty (-2 to -3): Allows the alignment to easily jump over large non-matching regions. Use for cross-species searches or when aligning a mature mRNA back to a genome containing introns.

Score Thresholds (Ungapped / Gapped)

Defaults: 10 / 14

These thresholds dictate which candidate alignments survive to the final results list. They represent raw Smith-Waterman scores based on your Match/Mismatch settings.

Min Ungapped Score (Phase 2): When a word match (seed) is found, the algorithm tries to extend it without allowing gaps. If the score of this ungapped block reaches this threshold, it triggers the heavy, gapped Smith-Waterman alignment.

Min Gapped Score (Phase 3): The absolute minimum score the final gapped alignment must achieve to be shown in the UI.

Understanding Raw Scores

With the default Match Score of 2:

  • A perfect match of 5 bases = Score of 10.
  • A perfect match of 7 bases = Score of 14.

If you set your Min Gapped Score to 20, the algorithm will completely hide any alignments that are shorter than 10 perfect matching bases (or longer alignments containing too many mismatches).

Back to BLAST Tool