![]() |
CCP4i: Graphical User Interface |
Molecular Replacement Module |
![]() ![]() ![]() |
The layout of each task window, i.e. the number of folders present, and whether these folders are open or closed by default, depends on the choices made in the Protocol folder of the task (see Introduction). Although certain folders are closed by default, there are specific reasons why you should or may want to look at them. These reasons are described in the Task Window Layout sections below.
This module provides interfaces to a number of CCP4 Molecular Replacement programs: Phaser, Molrep and AMoRe. There are also interfaces to two automated packages: MrBUMP and Balbes.
The list of tasks begins with a set of analysis tasks (grouped into the "Analysis" folder) which it may be useful to run. Cell Content Analysis gives an estimate of the number of molecules in the asymmetric unit, and thus the number of molecules that the MR procedure should search for. Analyse Data for MR calculates a native Patterson to check for pseudo-translation, and in addition compares the Wilson B factor with the average B factor of the search model. Self RF in polars and Self RF using Molrep enable calculations of a self-rotation function (where the native Patterson is compared with itself) giving the rotational component of any non-crystallographic symmetry.
These are followed by a set of tasks providing tools for generating molecular replacement trial models: Chainsaw (part of the CCP4 suite) and Modeller (an external package).
There are then tasks for running Phaser, Molrep, MrBUMP and Balbes, and a set of tasks for running AMoRe. These programs and packages can be seen as alternatives to one another, but you may need to try more than one to obtain a clear solution. The MrBUMP and Balbes packages are able to run Phaser and Molrep as part of an automated pipeline so might be the best place to start when beginning a new molecular replacement problem.
The tasks for AMoRe are based around a Model Database containing one or more search models together with files and solutions generated by Amore for each model. The Model Database is accessed via the Amore Model Database task, launched from the task list or from other Amore tasks.
Finally there is a set of utility tasks for manipulating and analysing coordinates and sequences, which may be useful during the molecular replacement process.
The program MATTHEWS_COEF, through the Cell Content Analysis task, can provide an estimate of the number of molecules in the asymmetric unit, and thus the number of molecules to search for in MR. It requires an MTZ file, and a fairly accurate estimate of the molecular weight of the protein (which can be obtained from the program RWCONTENTS, for example). The Matthews number is usually between 1.66+ and 4.0+ corresponding to protein contents of 75% to 30% but proteins with higher solvent contents will give higher values.
This analysis may not be conclusive in determining the number of molecules in the asymmetric unit, so the probabilities from the Matthews coefficient paper of Kantardjieff and Rupp are also printed if a high resolution limit is input: P(reso) is the probability using the input high resolution limit and P(tot) is the probability across all resolution ranges. This gives the probability of a particular Matthews coefficient based upon the high resolution limit.
The solvent content information will appear in the 'solvent analysis' field upon clicking Run Now. The Interface displays a table with values of the percentage of solvent in the unit cell, as well as the corresponding Matthews coefficient and the probabilities based on the input resolution limit, for a range of numbers of molecules in the asymmetric unit. The highest probability P(tot) gives a strong indication of the preferred solution.
See Stage 1 of the MR tutorial.
See program documentation: MATTHEWS_COEF.
Before running a molecular replacement program it is advisable to look a little at the data. This task will:
Check native Patterson map for large peaks which indicate pseudo-translation.
The task will run the FFT program to generate a native Patterson map and then run PEAKMAX to list the largest peaks to the .log file. You should be concerned if any non-origin peak is more than, say, 0.15 fraction of the origin peak, as this might suggest there is pseudo-translation present in the unit cell. The Molrep program has an option to handle pseudo-translation.
See also the Self RF in polars task for finding rotational non-crystallographic symmetry.
Compare the Wilson B factor with the average model B factor.
The task will run the Wilson program to determine the Wilson B factor from the data, and the program BAVERAGE to determine the average model B factor. The difference between these values is BADD which can be used in the AMoRe interface.
See Stage 2 of the MR tutorial.
See program documentation: FFT, WILSON, BAVERAGE
This task provides an interface to the self-rotation functionality of the program POLARRFN (the command-line program will also calculate a cross-rotation function). You should supply an MTZ file holding the observed data. By default, a plotfile is produced containing stereographic projections of each kappa section. This is the easiest way to view the results, for example the kappa = 180 section can be used to identify 2-fold axes. Optionally, the rotation function can be output as a CCP4-format map file.
The interface allows one to set a number of other parameters (see the polarrfn documentation), although in general the defaults can be accepted. Note that because of the limit of 100 in the order of the spherical harmonics in the program, the high resolution limit cannot be numerically less than the integration radius (arad) / 17.4: if it is, the program resets <resmax> to <arad> / 17.4.
See also the Analyse Data for MR task for finding pure translational non-crystallographic symmetry.
This task provides access to the Self Rotation Function generation of MOLREP. It can be viewed as an alternative to the Self Rotation in polar coordinates task.
This task provides an interface to the program Chainsaw. Input a template PDB file, e.g. from a homology search, and an alignment file of the template sequence against the target sequence in one of a variety of formats.
See program documentation: CHAINSAW
This task uses a non-CCP4 program MODELLER which is available from Andrej Sali (see Andrej Sali Lab) and see installation notes on MODELLER. Note that the program is Unix/Linux specific.
The CCP4i task has been tested against version 6 of Modeller. The latest release of Modeller (at the time of writing) is 7v7. Feedback on compatibility problems should be sent to us.
The input to this program is the structure of one or more homologs and the sequence of the protein for which you require a structure. MODELLER can produce a model which is, as closely as possible, identical to the input structure with changed residues generated with geometrically reasonable coordinates but this structure is liable to be energetically unreasonable due to close contacts. MODELLER can also refine this structure with restraints which aim to keep the structure close to the input homolog structures. Where homolog and model sequence are similar the structures are liable to remain closely similar but regions of low homology, in particular loops, can change significantly.
If you are doing molecular replacement you could use homology modelling in two different ways:
The quality of any output structure is hugely dependent on the quality of the alignment. MODELLER can do the necessary sequence alignment of the sequence and homolog structures but you are strongly recommended to review and possibly amend the alignment produced. You should also be sceptical of the exact sequence alignment output by any sequence database search. The database search uses protocols designed for speed rather than accuracy in low homology regions. An experienced crystallographer looking at the homolog structure with a graphics program will probably make a better assessment of sequence alignment.
The alignment file format used by MODELLER is not particularly simple (see MODELLER documentation on Alignment File Format) and it is probably easiest to run the CCP4i task with the sequence file and homolog structures as input and let this generate the alignment file (extension .ali) which you can then edit and use as input if necessary.
CCP4i expects the sequence to be input in a simple file with one letter amino acid code. It does not expect any extra titles or comments - beware if you have these in the file then they may be interpreted as sequence code. The line length is not fixed and any spaces or characters outside of the range A-Z will be ignored so it should be possible to cut and paste a sequence into a file without necessarily removing all gaps or extra characters.
The CCP4i interface has the option to produce a model which has no refinement, fast refinement or full refinement. Only in the latter case is there an option to generate more than one model. Beware that after full refinement the position of the model may have drifted from the position of the input homolog structure.
After refinement a graph file (extension .graph) is produced which contains a plot of restraint violations versus residue number; this is MODELLER's assessment of the quality of the model. Large values of restraint violation are bad; they usually correspond to regions of insertions or deletions in the alignment or significant differences in the sequences.
After refinement MODELLER puts the restraint violation in the Bvalues column of the output PDB file. The CCP4i script will, by default, replace these with Bvalues that the user can set in the task interface.
The CCP4i script can do some post processing of the output MODELLER model to edit either
These regions can be either deleted (the mutated residues being converted to glycine or alanine) or the occupancies can be set to zero.
This task provides an interface to the maximum likelihood MR program Phaser. Note that the task button may be greyed-out if the program has not been installed.
First choose the mode in the Protocol folder. Typically, one would start with the default mode "automated search", which combines most of the remaining modes. Only if there are problems, or a more detailed search is required, would the other modes be run separately.
See program documentation: PHASER
This a fully automated molecular replacement program which will attempt to find the number of molecules expected in the asymmetric unit as entered by the user. A PDB file for the best solution is output. It is also possible to run the program for just rotation or translation function; the rotation solutions are output to a file (given the extention .mr) and this can be used as input to a subsequent run of the translation function. When the .mr file is used as input, any lines beginning with a "#" character are ignored. When the .mr file is viewed within CCP4i, clicking on any line in the file will add or remove a "#" from the beginning of the line (see also Edit MR Solution File). Note that the format of this file is different from the format of .mr files output by AMoRe.
Molrep has other functions to do a self rotation function, search for a model in a phased map or an approach to fitting two molecules.
For the background theory of MOLREP, see MOLREP theory.
See program documentation: MOLREP
See Stage 3 of the MR tutorial.
MrBUMP performs automated search model generation and automated molecular replacement, and has three main parts:
Note that MrBUMP makes a number of calls to web-based applications. If your sequence information is in any way sensitive, it is recommended that you use the option to run the fasta search locally rather than via the OCA web application.
See program documentation: MrBUMP
BALBES is a system for solving protein structures using x-ray crystallographic data. Molecular Replacement (MR) is its core scientific method. BALBES aims to integrate all components, necessary for finding a solution structure by MR, into one system. It consists of a database, scientific programs and a python pipeline. The system is automated so that it needs no user's intervention when running complicated combination of jobs such as model searching, molecular replacement and refinement.
See the BALBES website for documentation including running the program through the CCP4i interface.
The Molecular Replacement module uses a database to store information on the trial models used in a project. If you use only one trial model, this may seem unnecessarily complicated, but if you need to use multiple trial models you will appreciate the database.
To run a simple AMoRe job, click on the AMoRe task in the module menu. The task window for AMoRe and a task window which interfaces to the AMoRe model database will appear. You will need to enter the following information on your trial model in the database:
All other filenames for intermediate files will be generated automatically from the model name.
It is possible to use a map as the trial structure in AMoRe - see below.
In the AMoRe task window, the protocol section has two menus for you to select a trial model from the database (if there is only one model in the database this will be set automatically) and the mode of running AMoRe. Usually you should keep the default auto-AMoRe for a start-to-finish run of the program. You will need to select the MTZ file containing the experimental data.
The first step in an AMoRe run is to move the trial coordinates to an optimal position centered on the origin; these coordinates are saved to a file. AMoRe then reports its best solutions in terms of transformations (Euler angles and translations) to be applied to the optimised, origin shifted, coordinates. These solutions are listed in the log file but also extracted into solution files (with file extension .mr). Solution files will be created for each of the rotation, translation and fitting stages of the AMoRe run, and the final file will have a name projectname_jobid_fit_model.mr where projectname is the name of the project, jobid is the job number and model is the name of the trial model.
The Molecular Replacement task Build AMoRe Output Model will apply the transformations stored in a solution file to the optimised coordinates and will also do some simple checks on the quality of a model - checking whether there is overlap between molecules in adjacent asymmetric units. You will need to select the solution file output by your AMoRe run.
In AMoRe, the words 'model' and 'molecule' are used with very specific meanings.
In the context of AMoRe a 'molecule' is the structural element which can be treated as a rigid body in molecular replacement. It may be anything from a structural domain which is not even a whole chain, to a multi-chain protein.
A trial 'model' is the initial set of test coordinates which are taken from another solved crystal structure or NMR structure. This coordinate set may have been processed in some way to make it more suitable for use in molecular replacement - for example loop regions could have been excised. It is possible (and may be advisable) to generate multiple models from one input coordinate set by different processing (for example different degrees of severity in excising loop regions or applying some homology modelling to try to make the model more like the expected structure in the experimental data).
In the simplest case Molecular Replacement can be used to find one rotation/translation solution to map a model onto the experimental structure. If this is your case, you can skip some of the following discussion and you can ignore the part of the Interface referring to 'known' molecule(s). You should (at least for a first try) use the auto-AMoRe option which will run through all the AMoRe functions automatically.
The non-simple cases are:
Alternative to inputting coordinates for model structures to AMoRe, some crystallographers prefer to input a map calculated from the coordinates, usually a sharpened E-map (that is a map generated using the normalised structure amplitudes rather than the SFs). The MR 'Create Input SFs from Model' task will create an MTZ file containing the appropriate Es or SFs and phis for a map to input to AMoRe. The name and coordinate file for the model must have been entered in the 'AMoRe Model Database' before running this task.
The task requires to know the cell parameters and resolution range - these can be read from an MTZ file such as the file containing the experimental data. This MTZ file is not used in any other way by the task.
To simplify running AMoRe, the Interface keeps a database of the models used for the molecular replacement. These models may be either variants of the same initial coordinate set which have been processed differently (for example with different loop regions excised) or they may be from different coordinate sets in cases where the experimental structure is made up of more than one 'molecule'. The contents of the database are displayed in a separate window which is opened when you select the AMoRe task. The key data you must input to the database is a name for each model and the name of either the coordinate file containing the model or an MTZ file containing SFs or Es for the model. The model name you enter will be used in menus and as part of filenames, so keep it short and distinct.
When AMoRe is run, some information will be automatically extracted from the log file and loaded into the database. This is visible in the 'AMoRe Details' folder. The information stored here currently is the name of transformed coordinate files, SF table files and details from the initial Tabling function (TABFUN) which are used by subsequent AMoRe functions.
Probably the most important parameter in an AMoRe run is the radius used by the rotation function. There is debate about the best value to use and for tricky problems it is always worth trying a range of values. The Interface script will automatically generate a reasonable value for the radius from the parameters output by the TABFUN stage. The Tabling function moves the trial model to an optimal position and orientation and reports to the log file the size of an enclosing box for the model. The Interface calculates the search radius as:
the minimum of
0.75 * (the minimum axis length of the model enclosing box)
and
0.5 * (the minimum crystal cell axis)
This search radius is saved in the MR Database for this model and will be used by default in future AMoRe runs. It is also used in the calculation of the model cell, in the case of an auto-AMoRe run, as follows:
amodel = atabfun-minimal-box + radius + 5.0
bmodel = btabfun-minimal-box + radius + 5.0
cmodel = ctabfun-minimal-box + radius + 5.0
where radius is the search radius as determined above. tabfun-minimal-box is the Minimal Box output in the logfile of the TABFUN stage. 5.0 is chosen as a nominal value for the resolution.
When AMoRe performs the rotation function, translation function or rigid body refinement (fitting function), it outputs the final result to the log file in lines which begin with the keyword SOLUTION (or some recognisable variation on it). The key data on the line are three Euler angles which are the rotation part of the solution and three fractional shifts which are the translation part of the solution.
It is often necessary to recycle these solutions as input into the next stage in AMoRe or to use them to generate well-positioned models. To simplify the recycling, the Interface automatically extracts the SOLUTION lines from the log file and saves them to a 'Solution File' which is put in the user's project directory and has a name like projectname_jobid_mode_model.mr where projectname is the name of the project, jobid is the job number, mode is either rot, tran or fit, depending on which stage these are the solutions for, and model is the name of the model this solution applies to. These MR files are analogous to the HA files of the Experimental Phasing module.
The solution file from the translation function will also include the alternative, lower scoring, translation solutions which are usually given the label SOLUT_1, SOLUT_2 in the log file. These solutions will be 'commented out' in the solution file which means that the lines containing these solutions will begin with a '#' character and they will not be read or used by default.
For subsequent AMoRe runs you should select which solution files to use as input and these will be edited into the input command file (any specification of the model number or the FIX keyword will be handled automatically). You do NOT need to edit the solution file in any way.
If you do not want to use all of the solutions in a solution file, then some lines from the file can be 'commented out' - that is a '#' character is placed at the beginning of the line so the rest of the line is then ignored by any program reading the file. The easiest way to edit a solution file is using the 'Edit MR Solution File' task. This task displays the contents of a solution file and you just need to click on a line to either add or remove the # at the beginning of the line.
You can access the 'Edit AMoRe Solution File' task in the conventional way from the task menu on the main CCP4i window or you can click on the 'View' button on the file selection line for a solution file.
By creating and using solution files automatically, the Interface simplifies running AMoRe and reduces the risk of errors, but there may be one or two tricks that you can do running AMoRe conventionally with scripts which you can not do easily with the Interface. There are a couple of ways to work round this:
Please let us know if you find any serious limitation which is liable to affect other users, and we will try to fix it.
The AMoRe process is split into functions which are described in the AMoRe program documentation but they are described here briefly from the point of view of someone using the Interface:
The Interface treats the AMoRe Sorting and Tabling functions together. The purpose of these functions is to process the input model and experimental data into a form which is most convenient for AMoRe, which is a packed hkl file of the experimental data and SF table file of the inverse Fourier transform of the model.
The Sorting function produces a packed hkl file from structure factors (read from an input MTZ) and can also produce a SF table file. The Tabling function will produce a SF table file from coordinates. So the input experimental data is processed by the Sorting function and the model data is processed by either the Sorting or Tabling function depending on whether it is in the form of a map or atom coordinates. The choice of processing step is handled automatically by the Interface.
When the Interface runs AMoRe, it will automatically run the Sorting and Tabling if the necessary SF table files do not exist but, provided you do not delete these files, the script will skip these functions for all subsequent AMoRe runs. This saves some time but beware the SF table files are large.
The Tabling function also moves the input coordinates to an optimal position centered on the origin and these optimised coordinates are saved in an output coordinate file. All the subsequent solutions output by AMoRe are transformations which should be applied to these optimised coordinates. The AMoRe interface has a 'get origin shifted model' option to recreate the optimised model coordinate file if you inadvertently delete it.
The rotation function is applied to one input model which is represented by an SF table file. The rotation function solutions are a list of rotations (no translation component) which are written to the log file and also to a solution file projectname_jobid_rot_model.mr. There will normally be multiple solutions and all of these solutions should be tested with the translation function.
The translation function is applied to one input model which is represented by an SF table file and rotation function solution(s) for the SAME model.
The output from the translation function is a list of transformations with both rotation and translation components - the rotation component is carried over, without change, from the rotation function solution. The solutions are extracted from the log file to a solution file called projectname_jobid_tran_model.mr which will list the one 'best' translation solution for each input rotation solution. It will also list, 'commented out', the alternative, poorer, solutions.
The FITFUN stage will refine the rotation and translation solution for one or more molecules simultaneously. The input is usually the solutions from the translation function.
If you have only one molecule in your asymmetric unit, the solution file from the translation stage should be input into the refinement. Each of the solutions from the translation function will be refined in turn and output to a final solution file projectname_jobid_fit_model.mr.
The usual procedure for solving an experimental structure containing more than one molecule, is to try to find a good solution for one molecule and then treat it as 'known' while you try to find the solution for the next molecule. Of course there may be more than one candidate for the solution of the 'known' molecule in which case the procedure will have to be repeated for all candidates.
If you have already determined the positions of some 'known' molecule(s) within your crystal then, in the translation function, they should be specified and they will be treated as fixed by the translation function. For each 'known' molecule you must specify the name of the model and a solution file containing both rotation and translation solutions (i.e. the solution file must be from the translation function). Only one solution will be taken from each solution file. The first solution not commented out in the file will be used. If you know the position of two, or more, molecules based on the same model, you should specify two or more solution files for the model.
Rigid body refinement (FITFUN) is applied to one or more input model(s) for which initial rotation and translation solutions must be specified. This function will refine those input solutions. To simplify the interface, the molecules are considered to be one 'test' molecule for which you can specify multiple alternative solutions and one or more 'known' molecules for which you can only specify one input solution. This is only a convention of the Interface; within AMoRe the refinement treats 'known' and 'test' molecules identically - they are refined simultaneously. If there are more than one solutions for the 'test' molecule defined in the solution file, AMoRe will do multiple refinement runs. The starting position of the 'test' molecule will differ for each refinement run but the starting position of the 'known' molecules will be the same for all runs. The final, refined position of the 'known' molecules will almost certainly be different for each run.
The interface to specify the 'known' molecules is identical to that for the translation function. For each selected solution file, one solution will be read from the file. For the 'test' molecule all uncommented solutions will be read from the solution file. The output from the fitting function is just one solution file projectname_jobid_fit_model.mr where model is the name of the test model.
It is possible to use auto-AMoRe, which will run the rotation, translation and fitting functions automatically, for structures with multiple molecules. The auto-AMoRe will attempt to find a solution for one model. If you already have one or more 'known' molecules, you should enter them in the interface.
Beware: if you have a case of NCS symmetry and 'know' one or more solutions, these solutions are liable to be found again.
AMoRe requires large amounts of memory to hold the maps in core. If your version of AMoRe is not built with large enough default arrays, the AMoRe log file reports that there is insufficient memory (though not in a very helpful fashion!). You should open the 'Memory Allocation' folder at the bottom of the AMoRe window and enter (some guess at) the appropriate memory allocation. Alternatively you can enter the parameters in the 'Memory Allocation' folder in the MR Database window and they will be used to update the parameters in the current AMoRe window and saved and used for all future AMoRe runs. See also Memory Allocation in the AMoRe program documentation.
AMoRe will only output the transformations which need to be applied to the initial coordinates, but will not generate a model with the transformations applied. The 'Build AMoRe Output Model' task will generate a coordinate file with the input model(s) transformed to best fit in the experimental model.
The input to this task is a solution file from the AMoRe fitting function. The solution file will contain a list of the models used in the fitting function. These are listed on the line beginning:
#CCP4I SCRIPT SOL fit
The Interface will look up the name of the coordinate file for the model in the database and these will be shown in the task window so you should not need to enter them. The task will put appropriate cell and symmetry information for the experimental structure into the coordinate file. The easiest way to provide this information is to give the name of the experimental data MTZ file from which the parameters can be extracted. The different 'molecules' in the structure are identified by different chain names: A,B,C etc..
This model is useful for testing the quality of the packing for the solution. This task will run the DISTANG program to list bad contacts between 'molecules'. You should look at the output log file for a listing of contacts.
See program documentation: AMoRe, DISTANG
The task runs with a choice of PDBCUR or PDBSET as the underlying program, and can be used to perform various manipulations on coordinate files.
See program documentation: PDBSET, PDBCUR
This provides several approaches to superposing molecules:
See program documentation: SUPERPOSE, LSQKAB, TOPP
The Import/Edit Protein Sequence Task is used for importing and manipulating protein sequences obtained from the SwissProt database at the EBI (alternative database sources can be specified in Configure Interface). Enter the SwissProt code of the protein sequence to be viewed and a summary view of the protein sequence appears in the Current Sequence window, with a full detailed view in the View Full Sequence Entry window. The Editing window contains editing tools to manipulate the protein sequence in the Current Sequence window. Enter the search string, click the Search For... button and if the search string is present in the protein sequence, it is highlighted. This section of the protein sequence can then be deleted, mutated into another specified sequence or amino acids can be inserted after this section. The changed sequence can then be saved in a file of your specification for future viewing and editing.
This program is the first CCP4i program to connect to a database over the internet in order to download a requested file. If you use a proxy server, please remember to set this in Configure Interface before running CCP4i.
See full program documentation: GET_PROT
Interface to run the ClustalW program. (See the full Program Documentation for the ClustalW Interface for more information).N.B. ClustalW is not distributed as part of CCP4 and needs to be obtained separately.
See also $CDOC/Mol_repl_itickle_tut.bath.ps, and Molecular Replacement (Birkbeck)
![]() ![]() ![]() |
Valid XHTML 1.0! Valid CSS! |