CCP4 Tutorial: Session 4 - Heavy Atom Search and Phasing by MAD

See also the accompanying document giving background information.

In the following instructions, when you need to type something, or click on something, it will be shown in red. Output from the programs or text from the interface is given in green.

Outline of the method

Scaling and analysing datasets
Preparing datasets for finding heavy atoms
Find heavy atoms
Heavy atom refinement
Final Refinement and Phasing
Density Modification
Testing the hand

The Data Files

Files in directory DATA:

gere_MAD_nat.mtz	reflection file containing all wavelength data prepared for experimental phasing by MAD
session4a.def	CCP4i .def file containing all parameters for running SCALEIT session 4a
gere_MAD_nat_scaleit1.mtz	reflection file as output by SCALEIT
rantan_set1.ha	.ha file output from RANTAN, input to initial refinement with MLPHARE
nat_sul_ref.ha	.ha file containing fractional coordinates of heavy atom sites, for final refinement with MLPHARE
nat_sul_ref_opp.ha	.ha file containing fractional coordinates of heavy atom sites on the opposite hand, for final refinement with MLPHARE

Files in directory RESULTS:

dm_gere_firsthand.log	.log of density modification for GerE - first hand
dm_gere_opphand.log	.log of density modification for GerE - opposite hand

4a) Scaling and analysing datasets

The Problem

You now have a file containing native data for GerE, and MAD data for a selenomethionine derivative. First, we scale each wavelength of the MAD data to the native dataset, so that all data is on the same scale. At the same time, we analyse the MAD data to estimate the strength of the dispersive and anomalous signals.

Exercise

Select the Experimental Phasing module, and open the Scale and Analyse Datasets task window.
On the first line, enter a suitable job title such as

Job title Scaling GerE datasets (mad tutorial step 1).
On the second line, select

Do scale refinement using Scaleit.

On the next 2 lines, select

Use anomalous difference data

and

Do cross-comparison of data sets and analyse dispersive differences

using the radiobuttons.

Select the input MTZ file

MTZ in TEST gere_MAD_nat.mtz

(If you do not have this file from the data processing/reduction session, take the file from the DATA directory.)

Now select the columns from the MTZ file. The first line has the native F_nat and SIGF_nat. Then select columns for the 4 wavelengths, using the button Add Derivative Data to add more columns. (It might be easier here to load the file DATA/session4a.def which already has these parameters set.) You should end up with:

FP	F_nat	SigmaFP	SIGF_nat
FPH1	F_infl	SigFPH1	SIGF_infl
DPH1	DANO_infl	SigDPH1	SIGDANO_infl
FPH+1	F_infl(+)	SigFPH+1	SIGF_infl(+)
FPH-1	F_infl(-)	SigFPH-1	SIGF_infl(-)
FPH2	F_lrm	SigFPH2	SIGF_lrm
DPH2	DANO_lrm	SigDPH2	SIGDANO_lrm
FPH+2	F_lrm(+)	SigFPH+2	SIGF_lrm(+)
FPH-2	F_lrm(-)	SigFPH-2	SIGF_lrm(-)
FPH3	F_peak	SigFPH3	SIGF_peak
DPH3	DANO_peak	SigDPH3	SIGDANO_peak
FPH+3	F_peak(+)	SigFPH+3	SIGF_peak(+)
FPH-3	F_peak(-)	SigFPH-3	SIGF_peak(-)
FPH4	F_hrm	SigFPH4	SIGF_hrm
DPH4	DANO_hrm	SigDPH4	SIGDANO_hrm
FPH+4	F_hrm(+)	SigFPH+4	SIGF_hrm(+)
FPH-4	F_hrm(-)	SigFPH-4	SIGF_hrm(-)

Check that the output MTZ file is given as

MTZ out TEST gere_MAD_nat_scaleit1.mtz

You should not need to change anything else. Select Run -> Run Now.
When the job has finished, return to the main window, highlight the job in the Job List, and select View Files from Job -> View Log Graphs. This task outputs a large number of graphs for analysing the data, and we will just look at some of them.
We can gauge the strength of the dispersive differences by looking at the graphs Centric Normal probability v resolution and Acentric Normal probability v resolution ... for each pair of wavelengths, e.g. ... FP = F_lrm FPH = F_infl SIGF_infl DANO_infl SIGDANO_infl. For each graph, look at the line Gradient_on_reflection_prob.lt.0.9. Use the crosswires to estimate a rough value, e.g. for the low-remote against the inflection, the value is about 1.128 for centric data and 1.254 for acentric data.

The values can be summarised as (these values are contained in the file View Files from Job -> ...scaleit.summary):
```
 Table: Normal Probability for acentric data

Normal Prob.   |  F_lrm       F_peak      F_hrm       
 ----------------------------------------------------------------------------
 F_infl        | 1.257        1.075       1.514       


 Table: Normal Probability for Centric data

 Normal Prob.   | F_lrm       F_peak      F_hrm       
 ----------------------------------------------------------------------------
 F_infl         | 1.111       0.921       1.453       
```
This shows that the Dispersive difference (i.e. difference in f' values between wavelengths) is smallest from the inflection to the peak, and largest from the inflection to the high-wavelength remote (the inflection point has the smallest f').
We can gauge the strength of the anomalous differences by looking at the graph Acentric Normal probability v resolution ... for F(+) and F(-) of each wavelength, e.g. ... FP = F(+)_infl FPH = F(-)_infl SIGF(-)_infl. For each graph, look at the line Gradient_on_reflection_prob.lt.0.9, and use the crosswires to estimate a rough value.

The values are summarised as:
```
 Table: Anomalous Differences ( FPHi+ v. FPHi-)


 Anom difference          | Prob_acent  Rfactor     
 --------------------------------------------------------------------------------------
 F_infl(+) v F_infl(-)    |   1.166       0.090       
 F_lrm(+)  v F_lrm(-)     |   1.340       0.088       
 F_peak(+) v F_peak(-)    |   1.430       0.112       
 F_hrm(+)  v F_hrm(-)     |   1.010       0.089       
```
This shows that the high-wavelength remote has the least anomalous signal, i.e. a low value of f". The peak wavelength has the largest f", while the other 2 wavelengths have intermediate values.
Close the Scaleit Task Window.

4b) Preparing datasets for finding heavy atoms

The Problem

Before carrying any experimental phasing it is necessary to know the atomic coordinates of the anomalous scatterers or heavy atoms. For Gere there are 12 Se atoms to be positioned. These can be found by a Patterson search or by direct methods. For 12 sites, a Patterson search is complicated. However it is always good practice to calculate Pattersons using both the anomalous difference of the peak wavelength and the largest dispersive difference. They should both show a similar pattern of peaks.

Exercise

Select the Experimental Phasing module, and open the Generate Patterson Map task window.
On the first line, enter a suitable job title such as

Job title Anomalous Peak Patterson 10-3.5Å (mad tutorial step 100).
On the next line, select

Run FFT to generate anomalous difference Patterson (data as F+ F-, not as D)

then select with the radio button

Plot default Harker map sections with no coordinates

Later, when we have some sites, we will check it by repeating the Patterson and plotting sections with vectors between atom coordinates.
Select the input MTZ file

MTZ in TEST gere_MAD_nat_scaleit1.mtz

(If you do not have this file from the previous session, take the file from the DATA directory.)

Now select the columns from the MTZ file:

F1 F_peak(+) SigmaF1 SIGF_peak(+)

F2 F_peak(-) SigmaF2 SIGF_peak(-)

Check that the output MAP file is given as

Map TEST gere_MAD_nat_patterson1.map
Now fill in the folder Exclude Reflections as follows:
The Exclude Reflections with differences between F1 and F2 > ? will be estimated from a scaleit analysis, or you can enter your own value. It is important to exclude outliers which are often due to measurement errors.
It is sensible to always exclude reflections with F less than n * sigmaF where n is 3 (for all data concerned).
You also need to select a suitable resolution limit. Use plots of the 'Analysis of data vs. resolution' to select sensible limits found in the scaleit run (View Files from Job -> View Log Graphs); here enter Resolution less than 10 Å or greater than 3.5 Å.
You should not need to change anything else, so select Run -> Run Now.

F1	F_peak(+)	SigmaF1	SIGF_peak(+)
F2	F_peak(-)	SigmaF2	SIGF_peak(-)

The Harker sections will be plotted - click on View Files from Job -> jobid...plt. It is a good idea to compare these plots for the dispersive and anomalous Pattersons. They should have a similar pattern of peaks.

You are now going to use a direct methods approach for locating the Se sites. In this section, you will prepare the MAD data for use in the direct methods program RANTAN. This task runs REVISE for generating the normalised anomalous scattering magnitude FM, and then the program ECALC for calculating the corresponding normalised structure factor E.

Exercise

Select the Experimental Phasing module, and open the Prepare Data for HA Search task window.
On the first line, enter a suitable job title such as

Job title Run revise for GerE data (mad tutorial step 120).
On the next line, select

Input MAD data as F+ F- and prepare data for Es (Rantan/Acorn)

Select the input MTZ file

MTZ in TEST gere_MAD_nat_scaleit1.mtz

Now select the columns from the MTZ file:

FPH+1	F_infl(+)	SigFPH+1	SIGF_infl(+)
FPH-1	F_infl(-)	SigFPH-1	SIGF_infl(-)
FPH+2	F_lrm(+)	SigFPH+2	SIGF_lrm(+)
FPH-2	F_lrm(-)	SigFPH-2	SIGF_lrm(-)
FPH+3	F_peak(+)	SigFPH+3	SIGF_peak(+)
FPH-3	F_peak(-)	SigFPH-3	SIGF_peak(-)
FPH+4	F_hrm(+)	SigFPH+4	SIGF_hrm(+)
FPH-4	F_hrm(-)	SigFPH-4	SIGF_hrm(-)

Check that the output MTZ file is given as

Output MTZ TEST gere_MAD_nat_prephadata1.mtz

Now fill in the folder Anomalous Data as follows:

Data set 1 collected at wavelength	0.981	with estimated F'	-6.0	and F"	2.0
Data set 2 collected at wavelength	1.1	with estimated F'	-3.0	and F"	3.0
Data set 3 collected at wavelength	0.98	with estimated F'	-4.0	and F"	4.0
Data set 4 collected at wavelength	0.9	with estimated F'	-3.0	and F"	1.0

In fact, the wavelengths are only used as labels by the program. The important values are f' and f" although the results are not very sensitive to the exact value. These values have been estimated from the known range of values of f' and f" for Se, and the relative dispersive and anomalous differences estimated in the previous section. You can plot an approximate distribution of f" and f' with wavelength for the different elements using CROSSEC.

You should not need to change anything else, so select Run -> Run Now.

4c) Find heavy atoms

The Problem

You have generated a column of E values which give a wavelength-independent measure of the anomalous scattering due to the Se sites. The Se sites can be found from the E values by Patterson methods, but here you will use a direct methods approach.

Exercise

Select the Experimental Phasing module, and open the Rantan - Direct Methods task window.
On the first line, enter a suitable job title such as

Job title Find Se sites for GerE (mad tutorial step 200).
On the second line, select

Set optimal Rantan parameters for estimates of FH or FA data (default)

and on the next line, select

generate map(s) and coordinate file listing peaks (default)
Select the input MTZ file

MTZ in TEST gere_MAD_nat_prephadata1.mtz

The rest of the necessary information (in the 'Files' folder and the 'Running Rantan' folder) should be filled in automatically.
You should not need to change anything else, so select Run -> Run Now.

When the job has finished, view the output MTZ file by selecting in the main window View Files from Job -> gere_MAD_nat_rantan1.mtz. The output file has 48 columns:

 * Column Labels :

 H K L F_infl(+) SIGF_infl(+) F_infl(-) SIGF_infl(-) mod_F_infl(+)
 mod_SIGF_infl(+) mod_F_infl(-) mod_SIGF_infl(-) F_lrm(+) SIGF_lrm(+) F_lrm(-)
 SIGF_lrm(-) mod_F_lrm(+) mod_SIGF_lrm(+) mod_F_lrm(-) mod_SIGF_lrm(-) F_peak(+)
 SIGF_peak(+) F_peak(-) SIGF_peak(-) mod_F_peak(+) mod_SIGF_peak(+) mod_F_peak(-)
 mod_SIGF_peak(-) F_hrm(+) SIGF_hrm(+) F_hrm(-) SIGF_hrm(-) mod_F_hrm(+)
 mod_SIGF_hrm(+) mod_F_hrm(-) mod_SIGF_hrm(-) FM SIGFM F E SIGE F2OR E2OR PHASE1
 WT1 PHASE2 WT2 PHASE3 WT3

RANTAN generates and refines a large number of possible phase sets (default 500), but only outputs the best ones (default 3) to the output MTZ file. These phases and the corresponding weights are held in the last 6 columns.

From each of these phase sets, the task calculates a map and locates peaks, which may correspond to Se sites. These peaks are output in both orthogonal and fractional coordinates. Click on View Files from Job to reveal a list of output files. For each phase set, there will be a .pdb (orthogonal coordinates) and a .ha (fractional coordinates) file, for example TEST_jobnumber_1.pdb and TEST_jobnumber_1.ha for phase set 1. The default peak search produces approximately 15 peaks - we expect there to be 12 Se sites for this protein (2 each for 6 chains). (Note that RANTAN starts from random phase sets, so the results are not always the same.)

4d) Heavy atom refinement

The Problem

You now have 3 sets of possible Se sites. Heavy atom refinement and phasing is done using the program MLPHARE. The stages are:

Refine heavy atom ( = Se) parameters (XYZ coordinates, B factor, real occupancy, anomalous occupancy), or a subset of these.
Remove incorrect sites. Their occupancy will refine to a small or negative value. These will be obvious after refining against the centric data only; there are enough centric observations to refine the Se parameters, and the occupancies. If you have no centric data use the "Use every XXX-th reflection for refinement" option to use a subset of the reflections for refinement.
Look for new atom sites using Fourier difference (or double difference) maps. These can be calculated after the preliminary centric refinement. The output MTZ file has phases for all the data to the requested resolution limit. A difference Fourier will contain all the present sites plus potential new sites; a double difference Fourier will only show potential new sites (and may therefore be more difficult to interpret). The peak height for a new site will be positive, but probably only 20-30% of the height of sites included in the refinement.
MLPHARE provides useful graphs to monitor your progress. The most useful are those labelled: "Lack of Closure analysis v resln" and "Anomalous lack of closure v resln". The Cullis Rfactor and the Anomolous Cullis Rfactor should be less than 1, and any "improvement" to the sub-structure should reduce them. The Phasing power should increase as the solution improves.
When you are satisfied that you have the complete solution, complete the refinement for the other wavelengths, and for the anomalous contributions, and generate the final phases.

To do this, you need to run MLPHARE several times. The steps using centric data or a subset of the full data set will be very fast. The Se parameters are held in a .ha file, which is updated after each pass. The output MTZ file will be used as input for the difference Fouriers.

For the tutorial, we just do the 1st stage (exercise 4d) and the last stage (exercise 4e). The intermediate stages are described at the end of exercise 4d.

Exercise

Select the Experimental Phasing module, and open the Run Mlphare task window.
On the first line, enter a suitable job title such as

Job title Refining Se sites for GerE - set 1 (mad tutorial step 300).
In the first folder, select:

Use centric data only.

There are enough centric observations to refine the Se parameters, and to indicate which sites are real, and which solution is best.

Leave everything else unselected.
Select the input MTZ file:

MTZ in TEST gere_MAD_nat_scaleit1.mtz

Now select the columns from the MTZ file. First we will refine the sites against the largest dispersive difference (See your SCALEIT summary for this).

FP F_infl SigmaFP SIGF_infl

FPH1 F_hrm SigFPH1 SIGF_hrm

Check that the output MTZ file is given as

Output MTZ TEST gere_MAD_nat_mlphare1.mtz
In the folder Data Harvesting, leave as:

Do not create harvest file
In the folder Key parameters, enter resolution limits (we do not use the less reliable data for preliminary refinement).

Resolution limit from 15.0 to 2.8.
In the folder Describe Derivatives & Refinement, enter a name for the derivative:

Phase with&refine derivative infl to hrm.

On the next line, check you are refining (real) occupancy only. There is no anomalous signal for the centric data and in this polar space group the Y coordinate cannot be refined. The form factor is set to Ano which means the atomic form factor is given as a single electron and the final occupancy will represent the number of electrons of the dispersive difference for this pair of measurements.

Use isomorphous data to refine occupancy (de-select 'refine XYZ' at this stage)

Then select the file of heavy atom coordinates output by RANTAN. You can use your own file if you want, but it is recommended to use the prepared file in DATA:

HA in DATA rantan_set1.ha
Select Run -> Run Now.
Click on View Files from Job -> TEST_jobnumber_1.ha and look at the list of refined sites. The occupancy of sites 7 and 12 are now negative (actual values may vary a little):
```
ATOM7   Ano   0.244  0.156  0.961 -0.590 BFAC   20.000
ATREF X ALL Y ALL Z ALL OCC ALL AOCC ALL B ALL
ATOM12  Ano   0.216  0.274  0.077 -4.304 BFAC   20.000
ATREF X ALL Y ALL Z ALL OCC ALL AOCC ALL B ALL
```
and these sites should be deleted from the list. The easiest way to delete them is to click on these lines in the viewer window, which turns them into comment lines. Then click Save&Exit.
Return to the Run Mlphare task window and rerun the refinement again without these sites. This will only take a few seconds. Also, select

Generate difference maps and do peak search for more heavy atoms
In the folder Describe Derivatives & Refinement, add in XYZ refinement:

Use isomorphous data to refine XYZ and refine occupancy
Update the heavy atom file:

HA in TEST TEST_jobnumber_1.ha
Select Run -> Run Now. The interface will ask you whether you want to overwrite gere_MAD_nat_mlphare1.mtz. This is OK, so click Delete File.

FP	F_infl	SigmaFP	SIGF_infl
FPH1	F_hrm	SigFPH1	SIGF_hrm

When the job has finished, you can check the refined Se sites as before. Now you need to inspect the difference Fourier map to see if there are extra sites. The job will have output a file: TEST_jobnumber_F_hrm.ha, which looks something like this:

GRID 114  68  76
CELL  108.7420   61.6790   71.6520   90.0000   97.1510   90.0000
ATOM    Ano   0.0792  0.1477  0.9868       28.76  0.0 BFAC  20.0
ATOM    Ano   0.2051  0.3929  0.8592       27.99  0.0 BFAC  20.0
ATOM    Ano   0.2597  0.0000  0.2456       27.02  0.0 BFAC  20.0
ATOM    Ano   0.1827  0.0717  0.5238       26.64  0.0 BFAC  20.0
ATOM    Ano   0.4297  0.1824  0.8797       26.55  0.0 BFAC  20.0
ATOM    Ano   0.2825  0.2330  0.9221       25.53  0.0 BFAC  20.0
ATOM    Ano   0.4660  0.2451  0.2418       24.44  0.0 BFAC  20.0
ATOM    Ano   0.3405  0.1578  0.3168       22.12  0.0 BFAC  20.0
ATOM    Ano   0.1336  0.1659  0.2071       13.98  0.0 BFAC  20.0
ATOM    Ano   0.3213  0.3957  0.6309       11.62  0.0 BFAC  20.0
ATOM    Ano   0.4934  0.1771  0.4131        6.57  0.0 BFAC  20.0
ATOM    Ano   0.0722  0.2257  0.8064        6.18  0.0 BFAC  20.0
ATOM    Ano   0.3587  0.0257  0.7882        5.93  0.0 BFAC  20.0
ATOM    Ano   0.4096  0.1526  0.8472        5.61  0.0 BFAC  20.0
ATOM    Ano   0.3738  0.1999  0.4861        5.19  0.0 BFAC  20.0
ATOM    Ano   0.4491  0.2259  0.1904        4.08  0.0 BFAC  20.0
ATOM    Ano   0.2776  0.0139  0.1304       -3.77  0.0 BFAC  20.0
ATOM    Ano   0.3254  0.1712  0.4312       -3.55  0.0 BFAC  20.0
ATOM    Ano   0.4751  0.2411  0.1694       -3.49  0.0 BFAC  20.0
ATOM    Ano   0.3252  0.2054  0.4343       -3.47  0.0 BFAC  20.0
ATOM    Ano   0.0179  0.1776  0.4259        3.42  0.0 BFAC  20.0
ATOM    Ano   0.2697  0.2017  0.8671        3.35  0.0 BFAC  20.0
ATOM    Ano   0.1271  0.4697  0.2386       -3.31  0.0 BFAC  20.0
ATOM    Ano   0.2554  0.3257  0.2372        3.30  0.0 BFAC  20.0
ATOM    Ano   0.2387  0.3913  0.9004        3.29  0.0 BFAC  20.0
ATOM    Ano   0.4382  0.2585  0.3811       -3.28  0.0 BFAC  20.0
ATOM    Ano   0.2398  0.1814  0.7578        3.25  0.0 BFAC  20.0
ATOM    Ano   0.2036  0.0740  0.4036       -3.24  0.0 BFAC  20.0
ATOM    Ano   0.2215  0.3875  0.0788        3.22  0.0 BFAC  20.0
ATOM    Ano   0.3055  0.1610  0.4985       -3.13  0.0 BFAC  20.0
ATOM    Ano   0.4253  0.2392  0.0122        3.06  0.0 BFAC  20.0
ATOM    Ano   0.0751  0.1712  0.0262       -3.05  0.0 BFAC  20.0
ATOM    Ano   0.2715  0.0100  0.1736       -3.02  0.0 BFAC  20.0

Add these extra sites to the original output .ha file (or use this one, which was output by the difference Fourier calculations). You can edit the file to set all occupancies to a constant value. If the first 16 of these difference Fourier peaks are used, after refining XYZ and Occ, 3 peaks can quickly be eliminated again. In fact, the first (correct) 12 keep coming up as the strongest peaks.

Once you are satisfied you have the complete solution, you need to do a refinement to the limit of the resolution to correct the B values. In the folder Describe Derivatives & Refinement, use XYZ refinement, and change the occupancy refinement to alternate occupancy and B factor:

Use isomorphous data to refine XYZ and alternate Occ & B
Select Run -> Run Now. The interface will ask you whether you want to overwrite gere_MAD_nat_mlphare1.mtz. This is OK, so click Delete File.

If you do not have time to do all the intermediate stages, you may skip to the final stage.... If you want to know what happens, read on:

There are several ways the refinement of the Se sites can be optimised:

Inspect difference Fourier maps for extra sites.
Repeat using the centric data, keeping F_infl as FP, and setting F_peak and F_lrm as FPH1 to get the relative occupancy of the dispersive difference between these wavelengths. You do not need to refine XYZ or B - keep the values from your final refinement for the largest differences, and just refine the real occupancy. Make copies of your final Finfl_v_F_hrm.ha file to use as input for the other refinements. If the Cullis R is less than 0.5 for some resolution shell, you probably have the correct solution.
You can repeat all this with a different set of sites from RANTAN.

Please note that these sites are not necessarily in the same asymmetric unit as the ones you have refined yourself, and most likely not in the same order. Keeping in mind that the spacegroup is C2, you can work out whether your set is 'correct'.

4e) Final Refinement and Phasing

The Problem

To get the best phases, we now include all wavelengths together, and use the anomalous signal as well. You need to do a final refinement and phasing run to first refine the anomalous occupancies (i.e. the relative number of electrons contributed by the different f"), then to output phases.

You will need four .ha files (variations of the four mentioned above). The first three you have refined using the F_infl with F_hrm, F_peak, and F_lrm. All these should have the same values for XYZ and B but different real occupancies. The anomalous occupancy value is 0.0. You now need to edit these to set the anom_Occ to 1.0, as a preliminary to refinement. You also need to copy one of these to provide a F_infl_v_F_infl.ha. Edit this to have real occupancies of 0.0 (anomalous occupancies of 1.0).

Exercise

In the Run Mlphare task window, enter a suitable job title such as:

Job title Refinement against all data (mad tutorial step 400).
In the first folder:

First de-select Use centric data only

Then select Use anomalous difference data

Then select Use every 10-th reflection for refinement

and

Apply calculated scale to output Sfs

Output Hendrickson-Lattman coefficients

Leave everything else unselected.

Select the input MTZ file:

MTZ in TEST gere_MAD_nat_scaleit1.mtz

Now select the columns from the MTZ file.

FP	F_infl	SigmaFP	SIGF_infl
FPH1	F_infl	SigFPH1	SIGF_infl
DPH1	DANO_infl	SigDPH1	SIGDANO_infl
FPH2	F_lrm	SigFPH2	SIGF_lrm
DPH2	DANO_lrm	SigDPH2	SIGDANO_lrm
FPH3	F_peak	SigFPH3	SIGF_peak
DPH3	DANO_peak	SigDPH3	SIGDANO_peak
FPH4	F_hrm	SigFPH4	SIGF_hrm
DPH4	DANO_hrm	SigDPH4	SIGDANO_hrm

Check that the output MTZ file is given as

Output MTZ TEST gere_MAD_nat_mlphare2.mtz

In the folder Key parameters, enter resolution limits to include all the data:

Resolution limit from 15.0 to 2.6.

In the folder Describe Derivatives & Refinement, sub-folder Derivative Number 1, enter a name for the derivative:

Phase with&refine derivative infl to infl.

On the next line, refine (anomalous) occupancy only, against anomalous data:

Use anomalous data to refine occupancy

Then:

either select the file of correct sites that has been provided:	or select the file:
HA in DATA nat_sul_ref.ha	HA in TEST F_infl_v_F_infl.ha
and (through the 'View' and 'Edit Columns' buttons) set occupancies 0.0 and set anomalous occupancies 1.0 then 'Save As..' F_infl_v_F_infl.ha on TEST, then 'Quit'; then 'Browse' for the new filename	as made according to the instructions above

In the sub-folder Derivative number 2 select:

Phase with&refine derivative infl to lrm.

Use anomalous data to refine occupancy

Then:

either select the file of correct sites that has been provided:	or select the file:
HA in DATA nat_sul_ref.ha	HA in TEST F_infl_v_F_lrm.ha
and (through the 'View' and 'Edit Columns' buttons) set anomalous occupancies 1.0 then 'Save As..' F_infl_v_F_lrm.ha on TEST, then 'Quit'; then 'Browse' for the new filename	as made according to the instructions above

Repeat 405 for the other 2 wavelengths (Derivative number 3 and 4, 'infl to peak' and 'infl to hrm', respectively).
Select Run -> Run Now.
When the job has finished, return to the main window, highlight the job in the Job List, and select View Files from Job -> View Log Graphs. Graphs are given for each wavelength, for both the last refinement cycle and the final phasing cycle. Look in particular at:

Lack of closure analysis .... / Phasing power ...., Lack of closure analysis .... / Cullis Rfactor ....

For good data, the phasing power should be greater than 1, and the Cullis Rfactor should be significantly less than one. The values for the different wavelengths ("derivatives") should correlate with the f' value (compared to that for F_infl).

Anomalous lack of closure analysis .... / Ano Cullis Rfactor .....

The anomalous Cullis Rfactor should be significantly less than one. The values for the different wavelengths ("derivatives") should correlate with the f" value

In fact, the example data do not give very good statistics. However, the structure was solved by this method!

4f) Density Modification

The Problem

The phase statistics output by MLPHARE will be the same, whether the sites are on the correct or the wrong hand (see Maths notes). However if anomalous data has been used, one set of phases will give an interpretable map whilst the other will generate a random one. It is essential to select the correct hand before attempting interpretation of the map. This is done by using Density Modification procedures. For the correct hand it should be possible to see a boundary between the protein part of the asymmetric unit and the solvent, while for the wrong hand there will be no clear distinction. In fact, MLPHARE gives realistic Figures of Merit and therefore you need to generate phases on both hands (see Notes on usage). Density modification (also known as Density Improvement) can be done using the program 'dm'.

Exercise

Select the Density Improvement module, and open the Run DM task window.
On the first line, enter a suitable job title such as

Job title DM on MAD phases - first hand (mad tutorial step 500).
Select the input MTZ file:

MTZ in TEST gere_MAD_nat_mlphare2.mtz

Now select the columns from the MTZ file.

FP F_native SIGFP SIGF_native

PHIO PHIB_mlphare1 Weight FOM_mlphare1
Enter the solvent content as

Fraction solvent content 0.538.
Everything else can be left as default, so Run -> Run Now.

FP	F_native	SIGFP	SIGF_native
PHIO	PHIB_mlphare1	Weight	FOM_mlphare1

4g) Testing the hand

The Problem

The procedure for locating the Se sites cannot distinguish between a particular set of sites and the same set of sites transformed through a point of inversion, i.e. it cannot distinguish the hand of the solution. Therefore, the previous phasing run should be repeated using the opposite hand. (The program ABS in CCP4 can also be used to determine the hand for the case of anomalous scattering.)

Then we look at two things:

look at maps - one map should have a clearer solvent boundary than the other.
run DM - one hand should give marginally better statistics than the other.

But the main difference is whether or not you can build a model ....

Exercise

Re-run the previous 2 exercises (4e and 4f), but using different files of sites.

Select the job as run for exercise 4e (Final Refinement and Phasing) from the Job List and select ReRun Job.. from the menu on the right of the Main Window.
Adapt the Job title line appropriately.

Job title Refinement against all data opposite hand (mad tutorial step 600)

Leave all input in the Protocol and Files folders as before.
Check that the output MTZ file is given as

Output MTZ TEST gere_MAD_nat_mlphare3.mtz

(or change accordingly).

In the folder Describe Derivatives & Refinement, sub-folder Derivative Number 1, adapt the HA in as follows:

either select the file of correct (opposite hand) sites that has been provided:	or select the file:
HA in DATA nat_sul_opp_ref.ha	HA in TEST F_infl_v_F_infl.ha
and (through the 'View' and 'Edit Columns' buttons) set occupancies 0.0 and set anomalous occupancies 1.0 then 'Save As..' F_infl_v_F_infl_opp.ha on TEST, then 'Quit'; then 'Browse' for the new filename	and (after the 'View' button) select Reverse hand, then OK (in Change Hand window), then 'Save As..' F_infl_v_F_infl_opp.ha on TEST, then 'Quit'; then 'Browse' for the new filename (or add _opp)

Similarly repeat steps 405 - 408.
Select the job as run for exercise 4f from the Job List and select ReRun Job.. from the menu on the right of the Main Window.
Adapt the Job title line appropriately.

Job title DM on MAD phases opposite hand (mad tutorial step 606)
Select the input MTZ file:

MTZ in TEST gere_MAD_nat_mlphare3.mtz
Everything else can be left as before, so Run -> Run Now.
To check the maps, select the Map & Mask Utilities module, and open the Run FFT - Create Map task window.
On the first line, enter a suitable job title such as

Job title mlphare dm hand 1 (mad tutorial step 610)

Then select (using the radiobutton)

Plot map sections with no coordinates
Select the MTZ file:

MTZ in TEST gere_MAD_nat_dm1.mtz

and select the columns from the MTZ file

F1 F_infl Sigma SIGF_infl (FDM does not have a Sigma)

PHI PHIDM Weight FOMDM
In the folder Select Plot Selections:

Plot sections on Y axis from 0.0 to 0.5 in steps of 5.0

Map contour levels entered as a range of *sigma

Contour levels from 1.5 to 2.5 by intervals of 0.5
Select Run -> Run Now.
Repeat with MTZ file gere_MAD_nat_dm2.mtz (the opposite hand).
When the jobs have finished, select the first 'dm' job from the Job List and View Files from Job -> View Log Graphs. Try to get it side by side with the Log Graphs from the second dm job and compare. Have a look at Phase and weight statistics (both Mean change in phase and Mean figures of merit), and Density Modification-Free-R factors. All in all, the statistics for the first hand are slightly better. Also, from the Log File it can be seen that the solvent boundary looks better for the first hand.

Select the first FFT job from the Job List and View Files from Job -> TEST_<jobnumber>_1.plt and bring up Picture 1. Try to get it side by side with Picture 1 from the .plt of the second FFT job. The solvent area looks a lot 'cleaner' for the first hand, whereas the contrast is less for the second hand.

On to the next tutorial - Molecular Replacement.

Back to the previous tutorial - Experimental Phasing (by MIR).

Back to the index.

Prepared by Liz Potterton and Martyn Winn, 2000

Adapted by Eleanor Dodson and Maria Turkenburg, 2002-2003