'STRAP:multiple sequence alignments '

charite.christo.protein
Interface ProteinParser

All Known Implementing Classes:
DSSP_Parser, NumberedSequence_Parser, PDB_Parser, SBD_Parser, SingleFastaParser, StupidParser, XML_SequenceParser

public interface ProteinParser

HELP Proteins are stored in plain text files. A typical error is to save protein files as MS-Word-Document because the proprietary WORD-format is not recognized by STRAP. Recognition of file formats in STRAP works by try-and-error disregarding of the file suffix.

At first the DSSP-format is assumed which in addition to the amino acid sequence also contains the C-alpha coordinates and the secondary structure definition. INCLUDE_DOC:DSSP_Parser

In case the protein file does not comply with the DSSP format the PDB-format (WIKI:Protein_Data_Bank_(file_format)) is tried. INCLUDE_DOC:PDB_Parser

If the file is not in PDB-format then the fasta-format is tested. INCLUDE_DOC:SingleFastaParser The first series of non blank characters should consist exclusively of digits followed by white space of any length and an amino acid sequence. EMBL-, WIKI:Genbank- and WIKI:Swissprot -files follow this scheme and should be parsed correctly. The header is almost ignored. We only look for the name of the compound and the organism to create some information texts.

WIKI:Genbank files usually contain nucleotide sequence rather than amino acids and nucleotides will be seen in the protein alignment. Genbank files can be interpreted with the dialog ITEM:charite.christo.strap.DialogGenbank. For other nucleotide sequences the reading frame and the translated regions can be set manually (ITEM:charite.christo.strap.EditDna). Three nucleotide bases yield one amino acid.

Finally, when no specific format was recognized all letters in the file are used as one letter codes of amino acids.

File compression: Files ending with .gz, .bz2, .Z or .zip will be decompressed automatically.

Problems:

SEE_CLASS:PDB_Parser SEE_CLASS:DSSP_Parser SEE_CLASS:NumberedSequence_Parser SEE_CLASS:ProteinParser SEE_CLASS:SingleFastaParser SEE_CLASS:StupidParser SEE_CLASS:XML_SequenceParser SEE_CLASS:SwissHeaderParser


Field Summary
static long IGNORE_SEQRES
           
static long SEQUENCE_FEATURES
           
static long SIDE_CHAIN_ATOMS
           
 
Method Summary
 boolean parse(Protein p, long options, BA text)
           
 

Field Detail

IGNORE_SEQRES

static final long IGNORE_SEQRES
See Also:
Constant Field Values

SIDE_CHAIN_ATOMS

static final long SIDE_CHAIN_ATOMS
See Also:
Constant Field Values

SEQUENCE_FEATURES

static final long SEQUENCE_FEATURES
See Also:
Constant Field Values
Method Detail

parse

boolean parse(Protein p,
              long options,
              BA text)
Parameters:
text - the entire file contents It is a byte array and not a String Object for performance reasons.
Returns:
true: success, false: inappropriate file format

'STRAP:multiple sequence alignments '

'The most important classes are StrapAlign, Protein and StrapEvent.'