Pfam 22.0 :: Help Page : Advanced Search Options
Explanation of advanced Pfam search server options

Search Mode

There are two different ways to search Pfam using HMMER. HMMER can search only for complete domains (historically called "ls" mode, for "local search"), where HMMER looks for one or more significant alignments that are global with respect to the profile HMM, and local with respect to the query sequence - dubbed "glocal" alignment, for "global to the model, local to the sequence". HMMER can also search in a fully local, Smith/Waterman mode (historically called "fs" mode, for "fragment search"), where it looks for one or more significant alignments that are local with respect to both the model and the query sequence. "ls" mode is the much more sensitive mode -- provided that your sequence actually does contain complete domains. "fs" mode is less sensitive in general, but capable of detecting fragments of domains, where perhaps part of the structure has been deleted or mutated beyond recognition. The profile HMMs for ls and fs mode have to be built and calibrated separately, because the two modes show entirely different expected score distributions on random sequences. Pfam therefore has two different profile HMM databases: Pfam_ls and Pfam_fs. By default, the Pfam server searches both databases, and integrates the results. Any "fs" mode hits that are completely overlapped by a more significant "ls" mode hit to the same model are removed from the output. Optionally, you may search the two databases separately (if perhaps you don't trust the integration or results, which is a bit ad hoc), or either one by itself.

Cutoff Strategy

There are two different ways to determine the significance level for reporting hits to Pfam models. By default, we cut off at a raw score threshold called the "gathering threshold" (GA cutoff). GA thresholds are set by Pfam curators for each model. These cutoffs are exactly what are used to collect the sequences included in Pfam Full alignments. The GA thresholds are considered to be trustworthy cutoffs at which effectively zero false positives get through, and the annotation coming from the Pfam server is extremely reliable and suitable for fully automated annotation (which is what Pfam's designed for). You may want less reliable annotation, in return for increased sensitivity, if you're willing to sort through more output for clues. In this case, select the E-value cutoff instead of the GA cutoff. The default 1.0 will report about 1 false positive hit per query sequence; you might turn this up to 10, in which case you'll get about 10 false positive hits per query sequence. Because of the false positives, you can expect the usually pretty domain picture to be messy, with several overlaps.