GESAMT (CCP4: Supported Program)

NAME

gesamt - General Efficient Structural Alignment of Macromolecular Targets

SYNOPSIS

1. Printout of usage instructions:

gesamt --help

2. Alignment and superposition of two structures:

gesamt foo_1st.pdb [{-s|-d} SEL1] foo_2nd.pdb [{-s|-d} SEL2] [-high|-normal] [foo_out.pdb]

3. Screening a pdb archive:

gesamt foo.pdb [{-s|-d} SEL] -pdb pdb-dir [-high|-normal] [hits.txt]

where SEL1/2 are optional selection strings and foo_out.pdb is an optional output file.

Keys "-s" and "-d" are used for the selection of a substructure. By default, all structure given in the corresponding file, is used. If there are more than one chain in the file, all chains are considered as a single structure. Selection format depends on the key used. Key "-s" correspondds to MMDB selection format, identical to what is used by Superpose. The format is described in pdbcur documentation. CCP4i interface works only with this type of selections.

Selection key "-d" provides a more flexible selection scheme used by SCOP:

"*", "(all)" - take all file

"-" - take chain without chain ID

"a:Ni-Mj,b:Kp-Lq,..." - take chain a residue number N insertion code i to residue number M insertion code j plus chain b residue number K insertion code p to residue number L insertion code q and so on

"a:,b:..." - take whole chains a and b and so on

"a:,b:Kp-Lq,..." - any combination of the above.

In difference of Superpose, Gesamt allows for arbitrary selection of residues, and disregards the secondary structure pattern of structures. Gesampt may be applied to non-contiguous sets of residues, partially complete and short chains.

DESCRIPTION

gesamt aligns two structures by an efficient clustering of short fragments, made from adjacent protein backbone C-alpha atoms, followed by an iterative three-dimensional refinement based on a dynamic programming procedure.

INPUT AND OUTPUT FILES

foo_1st.pdb

First input coordinate file (Query). Although typically a PDB file, it can also be in mmCIF or MMDB binary formats, and it can be gzipped ("*.gz") or compressed ("*.Z"). The input format is detected automatically. The resulting transformation matrix applied to this structure.

foo_2nd.pdb

Second input coordinate file (Target). Although typically a PDB file, it can also be in mmCIF or MMDB binary formats, and it can be gzipped ("*.gz") or compressed ("*.Z"). The input format is detected automatically.

foo_out.pdb

If specified, the result of applying the transformation matrix to foo_1st.pdb is written to foo_out.pdb.

pdb-dir

A directory with pdb files to align

foo.pdb

with. Gesamt will screen any selection of pdb files in given directory. Only files with extensions ".pdb", ".ent", ".pdb.gz' and ".ent.gz" are looked at.

hits.txt

Optional output file with the list of calculated alignments ordered by the decreasing Q-score.

Command line options

The optional selection strings [{-s|-d} SEL1/2] are in the format described above.

Keys "-high" and "-normal" specify "High" and "Normal" mode, respectively. In "Normal" mode (default), Gesamt balances the quality of alignment and computation speed. This is a recommended mode for most applications. In "High" mode, Gesamt attempts to reach maximal quality with no reference to speed considerations. In "High" mode, Gesamt is about 10 times slower and achieves quality improvement in few percents of all instances on comparison with "Normal" mode.

PROGRAM OUTPUT

The program reports the Transformation Matrix calculated for the best superposition of foo_1st.pdb onto foo_2nd.pdb and the RMSD from the superposition, as well as polar and Euler rotation angles and orthogonal translation vector.

The program then gives a residue-by-residue listing of the alignment. Strands and helices in the two structures are identified and given in the output. The output also lists distances between all matched residues at best structure superposition.

AUTHOR

Eugene Krissinel, CCP4, Research Complex at Harwell, Rutherford Appleton Laboratory, Didcot, OX11 0FA, UK.

REFERENCE

E.Krissinel (2012), Enhanced Fold Recognition using Efficient Short Fragment Clustering, in preparation

"*", "(all)"	-	take all file
"-"	-	take chain without chain ID
"a:Ni-Mj,b:Kp-Lq,..."	-	take chain a residue number N insertion code i to residue number M insertion code j plus chain b residue number K insertion code p to residue number L insertion code q and so on
"a:,b:..."	-	take whole chains a and b and so on
"a:,b:Kp-Lq,..."	-	any combination of the above.