![]() |
CCP4i: Graphical User Interface |
Experimental Phasing Module |
![]() ![]() ![]() |
The layout of each task window, i.e. the number of folders present, and whether these folders are open or closed by default, depends on the choices made in the Protocol folder of the task (see Introduction). Although certain folders are closed by default, there are specific reasons why you should or may want to look at them. These reasons are described in the Task Window Layout sections below.
Heavy atom (HA) files are short files which keep a record of the proposed heavy atom sites in a structure. They are analogous to the MR files of the Molecular Replacement module. The format of the file is similar to the ATOM input line for the MLPHARE heavy atom refinement program. There is one line per atom site and the line is free format beginning with the word ATOM:
ATOM atom_name x y z occupancy anomalous_occupancy BFAC B-factor
The interface to MLPHARE can use an HA file as input and HA files are output by:
HA files are generated with a default file name which is project_jobid_n.ha where n=1,2,3... . If you select an HA file from the menu under the View Files from Job button, it will be displayed in an HA file viewer which is similar to the MR file viewer and which has some simple functionality to edit the file. Picking a line in the file will put a # character at the beginning of a line and this line will then be ignored on input to MLPHARE. A second pick will remove the # character. There is a Change All button at the bottom of the viewer which will add or remove #'s from all ATOM lines. There is also an Edit Columns button which presents options to set the atom name, occupancy, anomalous occupancy and Bfactor for all the atoms in the file.
This task interfaces to the CAD program which can be used to:
To input more than one MTZ file, click on the Add input MTZ file button. By default all the data in the input MTZ file is put into the output file but you can change the Input option from 'all columns' to 'selected columns' and then select the columns using the Add column button. If you want to have the majority of the columns in the file, then click on the List All Columns button and then delete the columns you do not require using the Delete selected item option under the Edit list menu button. You will then need to select the column by clicking on one of the fields for that column with the right mouse button. See also Extending Frames and Toggle Frames.
CAD can not deal with more than 29 columns.
Do not include columns H, K and L in input. These are transferred to output automatically, and only upset the program.
Two special data types are used to signal that you are preparing data for translation functions of various types. They are:
There must be only one FCpart PHICpart per input file, and they must be the last items specified for LABIN. CAD generates equivalent reflections using only the ROTATIONAL part of the primitive symmetry operator (i.e. if the spacegroup is P212121, these reflections are analysed as though the spacegroup was P222). This is allowed for in the TFFC and RSEARCH programs.
Features to look out for in the CAD Task are:
Folder title | Importance | Comment |
---|---|---|
Files Folder | Add input MTZ file | To include more than one MTZ file |
Define MTZ Output | override space group, cell dimensions, sort order, hkl limits etc. | can also be done with SFTOOLS |
See program documentation: CAD, SFTOOLS
For the scaling of derivative to native datasets, two CCP4 programs are available: SCALEIT and FHSCAL. The tutorial on isomorphous replacement by I. Tickle describes the strengths and weaknesses of those programs. Note that there is no unique solution to the problem of scaling together two different datasets. Various problems can arise from:
The Scale Datasets task will run SCALEIT to scale together all the DPHn (the dispersive difference for the nth wavelength).
It will optionally do a cross-comparison of the anomalous data sets - this involves rerunning SCALEIT with the input:
LABIN FP = FPHn SIGFP = SIGFPHn FPH1 = FPHm SIGFPH1=SIGFPHm DPH1 = DPHm SIGDPH1= SIGDPHm
for all possible pairwise combinations of wavelengths n and m. From these runs, the cross-comparison Rfactor and normal probability for the acentric data are extracted.
It is also optional to perform analysis of dispersive differences by rerunning SCALEIT with the input:
LABIN FP = FPH(+)n SIGFP = SIGFPH(+)n FPH1 = FPH(-)n SIGFPH1= SIGFPH(-)n
From this analysis, the normal probabilities for the acentric and centric data and the Rfactor are extracted. The input MTZ file must contain the FPH(+)n and FPH(-)n. If you do not have data in this form, you should run the mtzMADmod program which converts DPHn to the appropriate form. This program is not interfaced. A better solution is to use the latest version of the TRUNCATE program which retains the FPH(+)n and FPH(-)n on output.
The results of both these analyses are tabulated in a summary file called project_jobid_scaleit.summary.
In the Protocol folder of the Scale Datasets task, you can choose:
Features to look out for in the Scale Datasets Task are:
Protocol option | Folder title | Importance | Comment |
---|---|---|---|
Analysis | Graphs of differences between datasets | Analysis against resolution always performed. | |
Refinement Parameters | Apply Wilson scaling | Final Wilson scaling (affects scale factor only) after least-squares scaling (scale and temperature factors). See also Wilson. | |
Fhscal Scaling Parameters | Perform Kraut scaling with FHSCAL. In extreme cases, namely if the high resolution limit of the native dataset is lower than that of (one of) the derivatives, certain reflections may not get output. See also Caveat in FHSCAL program documentation. | ||
Analysis | Analysis of FHSCAL results | SCALEIT ANALYSE is performed after scaling using FHSCAL (see protocol options 1 and 3). | |
Input Scaling Factors | Externally determined scales applied and analysis performed. No refinement. See also SCALE. |
See program documentation: SCALEIT, FHSCAL.
You will need to run this task for the following cases:
Input Data | Phasing Method |
---|---|
MAD | RANTAN or ACORN, SHELXS, RSPS, Anomalous Difference Patterson Maps |
SAD | RANTAN or ACORN, SHELXS |
SIR | RANTAN or ACORN, SHELXS |
In the Prepare Data for HA Search task window you should only need to identify the type of your data and which phasing program you intend to run, and the interface will make the necessary conversions described below.
MAD data is rescaled by the REVISE program to give an estimate of the normalised anomalous scattering magnitude (given the column label FM by RANTAN and ACORN but sometimes referred to as FA in the literature). The input data can be in the form of F(+) and F(-) for each wavelength or be anomalous differences Dano for each wavelength. The output FM can then be used in similar fashion to a single anomalous difference (Dano) or isomorphous difference (Diso). The theory behind this is described in the REVISE program documentation.
Direct methods programs such as SHELXS, RANTAN and ACORN usually work with data in the form of normalised intensities rather than the structure factors which are normally used in macromolecular crystallography. So structure factor data must be converted to normalised structure amplitudes for use in direct methods programs. The SHELXS program has an internal procedure to do this conversion but data intended for RANTAN and ACORN must go through the ECALC program which calculates normalised structure amplitudes (usually given the column label E).
RANTAN, ACORN and all other CCP4 programs work with experimental data in MTZ file format but SHELXS requires the data in an ASCII format described in the SHELX documentation. The Prepare Data for HA Search task will use MTZ2VARIOUS to convert an MTZ file to SHELXS format.
See program documentation: REVISE, MTZ2VARIOUS, ECALC.
The SHELX C/D/E task uses George Sheldrick's SHELX suite of programs, specifically SHELXC, SHELXD and SHELXE, which can be obtained from THE SHELX HOMEPAGE. It also borrows heavily from ideas and design in the HKL2MAP interface developed by Thomas Pape and Thomas R. Schneider.
If you use the SHELX programs in your structure determination then please be sure to acknowledge their use:
SHELXD References:
SHELXE References:
The SHELX C/D/E task offers a way of running the SHELX suite of programs as an automated "pipeline". It also allows the individual programs to be run on their own. The pipeline uses SHELXC to analyse the data and prepare input for the later stages of the procedure, SHELXD to search for heavy atom sites, and SHELXE to refine the sites and optionally to distinguish between the two possible enantiomorphs.
Note that if any of the required SHELX programs are not installed on your path then the interface cannot be launched from the main CCP4i window.
SHELXC prepares the data for the heavy atom search and refinement steps by calculating the FA values and phase shifts α from experimentally measured reflection data. The user must provide information on the type of experiment and the format in which the reflection data will be provided.
SHELXC can deal with the following types of experiments:
The SHELX C/D/E task allows the reflection data to be provided in a number of different formats:
The SHELXC step also requires the following information:
This will be extracted from the MTZ header if using MTZ input (and the Scalepack header if present), otherwise it must be entered manually. Note that SHELX doesn't recognise spaces in spacegroup names (which is different from the CCP4 programs), however CCP4i will attempt to convert spacegroup names to the correct format.
Output from SHELXC step
The SHELXC step outputs the following files:
The .ins parameter file is automatically given a name which consists of the CCP4i project name and the job number (e.g. PROJECT_115_shelxc_fa.ins). These files are used as input to the SHELXD and SHELXE steps. If SHELXD is being run immediately after the SHELXC step then these files are passed on automatically to the heavy atom location.
The SHELX_CDE task generates a number of graphs from the output of SHELXC, which can be viewed using loggraph. Certain graphs are only generated for specific experiments:
Table | Graph | Experiment(s) | Comments | Analysis of <I/sig>, completeness ... | <I/sig> vs Resolution | All |
---|---|---|---|
% Completeness vs Resolution | All | ||
<d'/sig> vs Resolution | SIRAS, SIR | ||
<d''/sig> vs Resolution | MAD, SAD, SIRAS | For SAD the high resolution cutoff can be estimated from where ΔF/σF falls below about 1.2. | Anomalous CC analysis | Anomalous CC versus Resolution | MAD | For MAD the high resolution cutoff can be estimated from finding where the correlation coefficient (CC) between the anomalous differences for wavelengths with the highest anomalous signal falls below about 30%. |
The SHELXD step performs heavy atom location and takes the output from a SHELXC run as its input. If SHELXD follows on directly from SHELXC then the interface passes these on automatically, otherwise the user needs to specify them explicitly:
The interface allows the user to set a number of critical parameters which are required for the heavy atom location step:
Parameter | Comments |
---|---|
Number and type of heavy atoms to search for | The estimated number of sites should be within around 20% of the true number. |
Number of attempts at locating the heavy atoms | |
Resolution cutoffs applied to the input data | The high resolution cutoff is a critical parameter for the heavy atom location, and it is recommended that the value calculated by SHELXC is used. This is written into the .ins file from SHELXC and picked up automatically by the task. |
Minimum distance between heavy atoms | The most common user error is to keep the default value of 3.5Å even though the distances between heavy atoms are normally less than this value. |
Allow for heavy atoms lying on special positions | Normally these sites are rejected by SHELXD |
A note on running the SHELXD step without SHELXC
The interface allows the user to run the SHELXD step independently of SHELXC, by taking the output from a SHELXC run from a previous job. In this case the user can alter some of the values in the .ins file, as detailed in the table above. Note however that some of the values generated by SHELXC are dependent upon the above parameters (for example the value used by the UNIT command), and that therefore differences in output may occur if the SHELXD step is run without rerunning SHELXC first to update the generated parameters too.
Output from SHELXD step
The SHELXD step outputs the following files:
The user can provide a name for the output heavy atom PDB file. The .res and .lst files are automatically given a name which consists of the CCP4i project name and the job number (e.g. PROJECT_115_shelxc_fa.ins).
If SHELXD doesn't write a .res file (because no CC value reached the target) then the task will terminate. Failure to generate a .res file usually means:
The SHELXD step produces a number of graphs:
Table | Graph | Comments |
---|---|---|
Occupancy for each site | Occupancy for each site | There should be a sharp drop in occupancy after the last true site.
If the occupancy of the last site is more than 0.2 it is worth rerunning the heavy atom location and increasing the number of sites to search for. |
CC All/Weak for each try | CC All/Weak per try |
The SHELXE step performs phasing and density modification using the results of the SHELXD and SHELXC steps, and produces calculated structure factors and phases which can be used to generate an initial electron density map.
If SHELXE follows on directly from SHELXC and SHELXD then the interface passes these on automatically, otherwise the user needs to specify them explicitly:
The following input parameters are available for the SHELXE step:
Output from SHELXE step
The SHELXE step generates the following output:
For both XtalView and MTZ format output there is one file for each requested enantiomorph. Note that for some spacegroups SHELXE will convert to the inverted symmetry when phasing the inverted enantiomorph (e.g. from P41 to P43). For MTZ output the interface takes account of this inversion automatically.
The phases from SHELXE can be used to generate maps which can be viewed for example using the FFT task in CCP4i.
The SHELXE step produces a number of graphs:
Table | Graph | Comments |
---|---|---|
Contrast and Connectivity | Contrast versus Cycle | A big difference in the contrast of the two heavy atom enantiomorphs usually indicates a good SHELXE solution. |
Connectivity versus Cycle | ||
Estimated CC(map) | Estimated CC(map) vs Resolution | A big difference in the correlation coefficient of the two heavy atom enantiomorphs usually indicates a good SHELXE solution. |
Crank is an automated package for structure solution via experimental phasing. Currently it covers heavy atom location, heavy atom refinement and phasing, and density modification. Crank provides the automation, while existing programs do the individual calculations. Note that the CCP4 version is a cut-down version appropriate to CCP4 programs - the full Leiden version allows the use of alternative programs for the various steps.
In the protocol folder, select the procedure you wish to use, followed by the programs to use. You then need to specify an MTZ file containing the relevant datasets. Finally, there are a small number of required parameters to supply: the type and number of the heavy atoms, the number of protein and nucleotide residues in the asymmetric unit, and an estimate of the B-factor and solvent content. The latter can be estimated by the interface (click on "Calculate B and Solv. Content"), using the program Wilson to estimate the overall B-factor. The closed folders at the bottom of the interface allow access to the program-specific parameters for each step of the procedure.
SHARP/autoSHARP is software for the experimental phasing of macromolecular crystal structures. This interface enables autoSHARP jobs to be run from within CCP4i.
Note that SHARP/autoSHARP is not part of the CCP4 suite and must be acquired separately from Global Phasing.
The Generate Patterson Map Task performs the following:
Optionally:
Erroneously large intensity differences can affect a Patterson map disproportionately because the parameter used, the intensity, is the square of the structure factor, and the square of a large number is a very large number. The effect seen in the Patterson map is ridges.
It is therefore usually a good idea to exclude the reflections with very high differences: FPH-FP from the difference Patterson and FPH+-FPH- from the anomalous difference Patterson. By default the Interface will run the SCALEIT program to analyse the data and use the value of 4.1*RMS(FPH-FP) which is a reasonable first estimate of a suitable cutoff. It may be worthwhile to try different cutoff values and look at the resultant Patterson map - the value used can be set at the top of the Exclude Reflections folder. Excluding 'good' reflections tends to degrade the map so it is not good to over-estimate the cutoff value. For very good data it may be unnecessary to exclude any data. The SCALEIT log file also has a table of Isomorphous and (if appropriate) Anomalous differences which show the number of reflections with given differences as a function of resolution shell.
Features to look out for in the Generate Patterson Map Task are:
Protocol option | Folder title | Importance | Comment |
---|---|---|---|
difference Patterson | Exclude Reflections | Exclude reflections with erroneously large (intensity) differences between F1 and F2 (i.e. FPH and FP) | see Excluding Large Intensity Differences |
anomalous difference Patterson | Exclude Reflections | Exclude reflections with erroneously large (intensity) differences between F1 and F2 (i.e. FPH+ and FPH-) | see Excluding Large Intensity Differences |
See program documentation: SCALEIT, FFT, MAPMASK, PEAKMAX, NPO, VECTORS, HAVECS.
ACORN is an ab initio procedure to solve a protein structure when atomic resolution data is available. In case of a structure containing heavy atoms, its procedures can be used for determination of anomalous scatterers from anomalous data where the resolution can be as low as 3Å to 4Å.
MAD data for ACORN must be preprocessed by the REVISE program (see above) which generates estimates of FM which is the normalised anomalous scattering factor. The input to REVISE is the FP and FPH(+)n and FPH(-)n for dataset n. These data should have been scaled by the SCALEIT program. REVISE also needs to know the wavelength, f' and f'' for each wavelength.
Features to look out for in the Acorn Task are:
Protocol option | Folder title | Importance | Comment |
---|---|---|---|
search and phase with starting coordinates | ACORN-MR Parameters | Choose between a limited search with a POSItioned fragment, or a full ROTation Function and TRANslation function search | |
determine small molecule structure | General Acorn Parameters | Choose appropriate grid sampling | Grid sampling defaults to 1/3 of the high resolution limit which, in case of small molecule structures, is commonly around 1Å |
search for heavy atom(s) at lower resolution | Separate window opens to 'Prepare Data for Experimental Phasing Programs' | ||
search for heavy atom(s) at lower resolution | Selecting Data | Choose appropriate resolution limits |
See program documentation: ACORN, ECALC, REVISE, SCALEIT.
PROFESSS is a tool to help in the identification of NCS related atoms from a list of heavy atom positions. At the moment, PROFESSS only works with 'traditional' PDB files. HA files as produced by ACORN or RANTAN (for instance) can not be fed into PROFESSS - the HA file needs to be converted through the Convert Coordinate Formats task in the Coordinate Utilities module.
The program first lists the triangles of atoms which it has found, then it analyses each pair of triangles as a possible NCS match. For each possible operator, a list of all matching atoms is given. For each pair of atoms, a 'loop factor' is listed. If the NCS operator is an N-fold rotation, the atom will be part of a 'loop' of N atoms (unless one is missing). This, along with an appropriate 3rd polar angle, can confirm the existence of a proper NCS operator.
Atoms are described by the atom serial number from the input PDB, along with 4 numbers listed in square brackets. The first of these is the number of the crystallographic symmetry operators, and the other three are the unit cell translations applied after the symmetry operator.
When calculating the distance between a pair of atoms, all symmetry equivalents are considered, but only the cell repeat giving the least distance is considered. In a very few cases of low order crystallographic symmetry this may cause atoms to be missed.
The RANTAN Direct Methods program can be applied to solving MAD data or isomorphous replacement data. The Interface will set the key input parameters appropriately for the type of data.
For isomorphous data, RANTAN works optimally with the input in the form of normalised amplitudes rather than structure factors so the Interface will usually run the ECALC program to convert SFs to normalised amplitudes. The Interface will alternatively allow input of either precalculated normalised amplitudes or normalised amplitudes and initial phases.
MAD data for RANTAN will be preprocessed by the REVISE program (see above) which generates estimates of FM which is the normalised anomalous scattering factor. The input to REVISE is the FP and FPH(+)n and FPH(-)n for dataset n. These data should have been scaled by the SCALEIT program. REVISE also needs to know the wavelength, f' and f'' for each wavelength.
See program documentation: RANTAN, ECALC, REVISE, SCALEIT.
The SHELX program can be obtained from THE SHELX HOMEPAGE. The CCP4i interface is for SHELXS-97. To ensure that CCP4i scripts can find the SHELX program, the full path name of the program needs to be entered in the Configure Interface window which is accessed from a button in the System Administration menu on the right hand side of the Main Window.
For more information on the SHELX programs, see THE SHELX HOMEPAGE. This has references to various FAQs: The SHELX Homepage; Frequently asked questions (macromolecules), and Thomas Schneider's FAQs.
RSPS is a grid search program that provides search options (to solve heavy atom derivatives) as well as interactive options for examining potential solutions (as a fit of potential sites to the difference Patterson map). All options operate in real and vector space. Searches can be performed to locate either heavy atom positions, or, under certain conditions, to locate the position of molecules with internal (NCS) symmetry. The goal of RSPS is not to generate a complete solution to the heavy atom difference Patterson, but rather to find enough sites to allow initial phases to be calculated for difference Fourier analysis.
Searches are carried out by assigning trial positions on a grid covering the asymmetric unit of the crystal, and then computing a score for each trial position, based on the Patterson densities at the positions corresponding to the predicted vectors for each position. From the symmetry operators (crystallographic and/or non-crystallographic) all unique transformations that map a point in real (crystal) space to a point in vector (Patterson) space are generated. In other words, these transformations map a point in real space to the Patterson vectors associated with that point.
This task is a simple interface to the BP3 program for heavy atom refinement and phasing, using multivariate likelihood techniques. It requires datasets for native and derivatives supplied in the file HKLIN, and initial coordinates for the heavy atoms. Use the Add Crystal button to add details of all the crystals you have. For each crystal, you can add heavy atom coordinates, and also datasets recorded for the crystal.
If a heavy atom site occurs in more than one crystal, then select "Same site in more than one crystal", and Add Site.
Rather than inputting all the details manually, parameters can be entered as an XML file, e.g. as obtained from the output of the heavy atom location program Crunch2.
Phaser is a program for phasing macromolecular crystal structures with maximum likelihood methods. This interface gives access to Phaser's functions for experimental phasing by single-wavelength anomalous diffraction (SAD), which optionally can exploit information from a partial (molecular replacement) model.
MLPHARE can be used to refine either isomorphous or anomalous data. Check the 'Use anomalous difference data' box at the top of the MLPHARE interface if appropriate. The initial default interface only provides for describing one derivative or wavelength; click on the Add Another Derivative button under the 'MTZ in' section to open space for additional data.
The minimal input then required is some initial heavy atom definitions in the folder Describe Derivatives & Refinement. For each derivative enter a name, and the name of the HA file containing the data for that derivative. Alternatively, enter the atoms explicitly by changing the Use data 'from file' menu option to Use data 'entered below' and then typing in the information. The Cut and Paste tool may be useful. For anomalous data you will need to enter the same HA file for each wavelength.
It is possible to edit the HA files 'on line' by clicking the View button on the file selection line. The HA file viewer has some simple editing tools but more complex changes may need to be done in an editor.
The output MTZ file contains columns PHIB_mlphare1, FOM_mlphare1 etc.. If you use this file as input to another MLPHARE run, set a new unique column name extension. Change the parameter 'Output label identifier' from mlphare1 to mlphare2 for instance. Each run of MLPHARE within the Interface also outputs one HA file for each derivative. These HA files can be used as input to the next MLPHARE run.
The SCALEIT documentation states: "MLPHARE has a built in weighting scheme which means that it doesn't do much harm to include less good data in phasing. After all the poor hkl should get low FOMs, and then DM can use the few reflections with reasonable phases to help in the phase extension procedure."
The MLPHARE program documentation has several helpful hints, e.g.: "NB: If an occupancy becomes near to 0.0 the coordinate shifts will possibly be meaningless", and a whole section of Notes on usage.
Suggested input numbers for Estimated Lack of Closure:
MLPHARE is one of the Data Harvesting programs. See Data Harvesting in CCP4i for implications for the Interface.
The MLPHARE interface has the option to output double difference maps which can be used to search for further heavy atoms. In this case the PEAKMAX program will also be run to list the peaks to a PDB file and to an HA file with the name project_jobid_label_peaks.ha where label is the MTZ column label of the derivative FPH. If you wish to do any other analysis on the map, it can be input to the 'Generate Patterson Map' task when the 'Run FFT ...' option at the top of the task window has been toggled off.
It is easiest to create maps by running the FFT task inside the Run Mlphare task. Do this by toggling on the option to 'Generate double difference maps files ...'.
In some cases it may be necessary to (re)create maps independently from the MLPHARE task. It is not possible to do this through the Create Task-Specific Maps task in the Map & Mask Utilities module. And only if you know exactly what you are doing should you attempt to do this through the Run FFT - Create Map task in the Map & Mask Utilities module.
See program documentation: MLPHARE, PEAKMAX, FFT.
See the documentation on using the Acorn interface elsewhere in this document.
OASIS is a computer program for breaking phase ambiguity in One-wavelength Anomalous Scattering or Single Isomorphous Replacement (Substitution) protein data. The phase problem is reduced to a sign problem once the anomalous-scatterer or the replacing-heavy-atom sites are located. OASIS applies a direct method procedure to break the phase ambiguity intrinsic to OAS or SIR data.
Mapslicer is an interactive viewer which displays 2d contoured sections through CCP4 map files, most usefully for seeing peaks in Patterson maps.
The program COORDCONV is used to convert coordinate files from various formats, including HA files into other suitable formats such as PDB format.
See program documentation: COORDCONV
The Clipper Phasematch program can compare two sets of phases and provide appropriate analyses.
See also
MIRTutorial(Bath)
(the HTML equivalent of $CDOC/Iso_repl_itickle_tut.bath.ps),
Isomorphous Replacement (Birkbeck),
LLNL - Bernhard Rupp's Crystallographic Web Applets
(containing an applet which calculates expected anomalous dispersion ratios),
Chooch (a program for calculating Anomalous Scattering Factors from X-ray fluorescence
data).
![]() ![]() ![]() |
Valid XHTML 1.0! Valid CSS! |