Valid protein characters for PHI-BLAST patterns:
ABCDEFGHIKLMNPQRSTVWXYZUValid DNA characters for PHI-BLAST patterns:
ACGTOther useful delimiters:
[ ] means any one of the characters enclosed in the brackets e.g., [LFYT] means one occurrence of L or F or Y or T - means nothing (this is a spacer character used by PROSITE) x with nothing following means any residue x(5) means 5 positions in which any residue is allowed (and similarly for any other single number in parentheses after x) x(2,4) means 2 to 4 positions where any residue is allowed, and similarly for any other two numbers separated by a comma; the first number should be < the second number. > can occur only at the end of a pattern and means nothing it may occur before a period (another spacer used by PROSITE) . may be used at the end of the pattern and means nothingWhen using the stand-alone program, the pattern shouldbe in a file, with the first line starting:
IDfollowed by 2 spaces and a text string givign the pattern a name.
There should also be a line starting
PAfollowed by 2 spaces followed by the pattern description.
All other PROSITE codes in the first two columns are allowed,but only the HI code, described below is relevant to PHI-BLAST.
Here is an example from PROSITE.
ID CNMP_BINDING_2; PATTERN.AC PS00889;DT OCT-1993 (CREATED); OCT-1993 (DATA UPDATE); NOV-1995 (INFO UPDATE).DE Cyclic nucleotide-binding domain signature 2.PA [LIVMF]-G-E-x-[GAS]-[LIVM]-x(5,11)-R-[STAQ]-A-x-[LIVMA]-x-[STACV].NR /RELEASE=32,49340;NR /TOTAL=57(36); /POSITIVE=57(36); /UNKNOWN=0(0); /FALSE_POS=0(0);NR /FALSE_NEG=1; /PARTIAL=1;CC /TAXO-RANGE=??EP?; /MAX-REPEAT=2;The line starting
IDgives the pattern a name.The lines starting
AC, DT, DE, NR, NR, CCare relevant to PROSITE users, but irrelevant to PHI-BLAST.These lines are tolerated, but ignored by PHI-BLAST.
The line starting
PAdescribes the pattern as: one of LIVMFfollowed by Gfollowed by Efollowed by any single characterfollowed by one of GASfollowed by one of LIVMfollowed by any 5 to 11 charactersfollowed by Rfollowed by one of STAQfollowed by Afollowed by any single characterfollowed by one of LIVMAfollowed by any single characterfollowed by one of STACVIn this case the pattern ends with a period.It can end with nothing after the last specifying symbolor any number of > signs or periods or combination thereof.
Here is another example, illustrating the use of an HI line.
ID ER_TARGET; PATTERN.PA [KRHQSA]-[DENQ]-E-L>.HI (19 22)HI (201 204)In this example, the HI lines specify that the patternoccurs twice, once from positions 19 through 22 in thesequence and once from positions 201 through 204 in thesequence.These specifications are relevant when stand-alone PHI-BLAST isused with the "seedp"option, in which the interesting occurrences of the patternin the sequence are specified. In this case theHI lines specify which occurrence(s) of the patternshould be used to find good alignments.
In general, the seedp option is more useful than thestandard patternp option ONLY when thepattern occurs K > 1 times in the sequence ANDthe user is interested in matching to J < K of thoseoccurrences.Then using the HI lines enables the user to specify whichoccurrences are of interest.