CCP4 Tutorial: Session 4 - Heavy Atom Search and Phasing by MAD

See also the accompanying document giving background information.

In the following instructions, when you need to type something, or click on something, it will be shown in red. Output from the programs or text from the interface is given in green.

Outline of the method

  1. Scaling and analysing datasets
  2. Preparing datasets for finding heavy atoms
  3. Find heavy atoms
  4. Heavy atom refinement
  5. Final Refinement and Phasing
  6. Density Modification
  7. Testing the hand

The Data Files

Files in directory DATA:

gere_MAD_nat.mtzreflection file containing all wavelength data prepared for experimental phasing by MAD
session4a.defCCP4i .def file containing all parameters for running SCALEIT session 4a
gere_MAD_nat_scaleit1.mtzreflection file as output by SCALEIT
rantan_set1.ha.ha file output from RANTAN, input to initial refinement with MLPHARE
nat_sul_ref.ha.ha file containing fractional coordinates of heavy atom sites, for final refinement with MLPHARE
nat_sul_ref_opp.ha.ha file containing fractional coordinates of heavy atom sites on the opposite hand, for final refinement with MLPHARE

Files in directory RESULTS:

dm_gere_firsthand.log.log of density modification for GerE - first hand
dm_gere_opphand.log.log of density modification for GerE - opposite hand

4a) Scaling and analysing datasets

The Problem

You now have a file containing native data for GerE, and MAD data for a selenomethionine derivative. First, we scale each wavelength of the MAD data to the native dataset, so that all data is on the same scale. At the same time, we analyse the MAD data to estimate the strength of the dispersive and anomalous signals.

Exercise

  1. Select the Experimental Phasing module, and open the Scale and Analyse Datasets task window.

  2. On the first line, enter a suitable job title such as

    Job title Scaling GerE datasets (mad tutorial step 1).

  3. On the second line, select

    Do scale refinement using Scaleit.

    On the next 2 lines, select

    Use anomalous difference data

    and

    Do cross-comparison of data sets and analyse dispersive differences

    using the radiobuttons.

  4. Select the input MTZ file

    MTZ in TEST gere_MAD_nat.mtz

    (If you do not have this file from the data processing/reduction session, take the file from the DATA directory.)

    Now select the columns from the MTZ file. The first line has the native F_nat and SIGF_nat. Then select columns for the 4 wavelengths, using the button Add Derivative Data to add more columns. (It might be easier here to load the file DATA/session4a.def which already has these parameters set.) You should end up with:

    FPF_nat SigmaFPSIGF_nat
    FPH1F_infl SigFPH1SIGF_infl
    DPH1DANO_infl SigDPH1SIGDANO_infl
    FPH+1F_infl(+) SigFPH+1SIGF_infl(+)
    FPH-1F_infl(-) SigFPH-1SIGF_infl(-)
    FPH2F_lrm SigFPH2SIGF_lrm
    DPH2DANO_lrm SigDPH2SIGDANO_lrm
    FPH+2F_lrm(+) SigFPH+2SIGF_lrm(+)
    FPH-2F_lrm(-) SigFPH-2SIGF_lrm(-)
    FPH3F_peak SigFPH3SIGF_peak
    DPH3DANO_peak SigDPH3SIGDANO_peak
    FPH+3F_peak(+) SigFPH+3SIGF_peak(+)
    FPH-3F_peak(-) SigFPH-3SIGF_peak(-)
    FPH4F_hrm SigFPH4SIGF_hrm
    DPH4DANO_hrm SigDPH4SIGDANO_hrm
    FPH+4F_hrm(+) SigFPH+4SIGF_hrm(+)
    FPH-4F_hrm(-) SigFPH-4SIGF_hrm(-)

    Check that the output MTZ file is given as

    MTZ out TEST gere_MAD_nat_scaleit1.mtz

  5. You should not need to change anything else. Select Run -> Run Now.

  6. When the job has finished, return to the main window, highlight the job in the Job List, and select View Files from Job -> View Log Graphs. This task outputs a large number of graphs for analysing the data, and we will just look at some of them.

  7. We can gauge the strength of the dispersive differences by looking at the graphs Centric Normal probability v resolution and Acentric Normal probability v resolution ... for each pair of wavelengths, e.g. ... FP = F_lrm FPH = F_infl SIGF_infl DANO_infl SIGDANO_infl. For each graph, look at the line Gradient_on_reflection_prob.lt.0.9. Use the crosswires to estimate a rough value, e.g. for the low-remote against the inflection, the value is about 1.128 for centric data and 1.254 for acentric data.

    The values can be summarised as (these values are contained in the file View Files from Job -> ...scaleit.summary):

    
     Table: Normal Probability for acentric data
    
    Normal Prob.   |  F_lrm       F_peak      F_hrm       
     ----------------------------------------------------------------------------
     F_infl        | 1.257        1.075       1.514       
    
    
     Table: Normal Probability for Centric data
    
     Normal Prob.   | F_lrm       F_peak      F_hrm       
     ----------------------------------------------------------------------------
     F_infl         | 1.111       0.921       1.453       
    

    This shows that the Dispersive difference (i.e. difference in f' values between wavelengths) is smallest from the inflection to the peak, and largest from the inflection to the high-wavelength remote (the inflection point has the smallest f').

  8. We can gauge the strength of the anomalous differences by looking at the graph Acentric Normal probability v resolution ... for F(+) and F(-) of each wavelength, e.g. ... FP = F(+)_infl FPH = F(-)_infl SIGF(-)_infl. For each graph, look at the line Gradient_on_reflection_prob.lt.0.9, and use the crosswires to estimate a rough value.

    The values are summarised as:

     Table: Anomalous Differences ( FPHi+ v. FPHi-)
    
    
     Anom difference          | Prob_acent  Rfactor     
     --------------------------------------------------------------------------------------
     F_infl(+) v F_infl(-)    |   1.166       0.090       
     F_lrm(+)  v F_lrm(-)     |   1.340       0.088       
     F_peak(+) v F_peak(-)    |   1.430       0.112       
     F_hrm(+)  v F_hrm(-)     |   1.010       0.089       
    

    This shows that the high-wavelength remote has the least anomalous signal, i.e. a low value of f". The peak wavelength has the largest f", while the other 2 wavelengths have intermediate values.

  9. Close the Scaleit Task Window.

4b) Preparing datasets for finding heavy atoms

The Problem

Before carrying any experimental phasing it is necessary to know the atomic coordinates of the anomalous scatterers or heavy atoms. For Gere there are 12 Se atoms to be positioned. These can be found by a Patterson search or by direct methods. For 12 sites, a Patterson search is complicated. However it is always good practice to calculate Pattersons using both the anomalous difference of the peak wavelength and the largest dispersive difference. They should both show a similar pattern of peaks.

Exercise

  1. Select the Experimental Phasing module, and open the Generate Patterson Map task window.

  2. On the first line, enter a suitable job title such as

    Job title Anomalous Peak Patterson 10-3.5Å (mad tutorial step 100).

  3. On the next line, select

    Run FFT to generate anomalous difference Patterson (data as F+ F-, not as D)

    then select with the radio button

    Plot default Harker map sections with no coordinates

    Later, when we have some sites, we will check it by repeating the Patterson and plotting sections with vectors between atom coordinates.

  4. Select the input MTZ file

    MTZ in TEST gere_MAD_nat_scaleit1.mtz

    (If you do not have this file from the previous session, take the file from the DATA directory.)

    Now select the columns from the MTZ file:

    F1F_peak(+) SigmaF1SIGF_peak(+)
    F2F_peak(-) SigmaF2SIGF_peak(-)

    Check that the output MAP file is given as

    Map TEST gere_MAD_nat_patterson1.map

  5. Now fill in the folder Exclude Reflections as follows:
    The Exclude Reflections with differences between F1 and F2 > ? will be estimated from a scaleit analysis, or you can enter your own value. It is important to exclude outliers which are often due to measurement errors.
    It is sensible to always exclude reflections with F less than n * sigmaF where n is 3 (for all data concerned).
    You also need to select a suitable resolution limit. Use plots of the 'Analysis of data vs. resolution' to select sensible limits found in the scaleit run (View Files from Job -> View Log Graphs); here enter Resolution less than 10 Å or greater than 3.5 Å.

  6. You should not need to change anything else, so select Run -> Run Now.

The Harker sections will be plotted - click on View Files from Job -> jobid...plt. It is a good idea to compare these plots for the dispersive and anomalous Pattersons. They should have a similar pattern of peaks.

You are now going to use a direct methods approach for locating the Se sites. In this section, you will prepare the MAD data for use in the direct methods program RANTAN. This task runs REVISE for generating the normalised anomalous scattering magnitude FM, and then the program ECALC for calculating the corresponding normalised structure factor E.

Exercise

  1. Select the Experimental Phasing module, and open the Prepare Data for HA Search task window.

  2. On the first line, enter a suitable job title such as

    Job title Run revise for GerE data (mad tutorial step 120).

  3. On the next line, select

    Input MAD data as F+ F- and prepare data for Es (Rantan/Acorn)

  4. Select the input MTZ file

    MTZ in TEST gere_MAD_nat_scaleit1.mtz

    Now select the columns from the MTZ file:

    FPH+1F_infl(+) SigFPH+1SIGF_infl(+)
    FPH-1F_infl(-) SigFPH-1SIGF_infl(-)
    FPH+2F_lrm(+) SigFPH+2SIGF_lrm(+)
    FPH-2F_lrm(-) SigFPH-2SIGF_lrm(-)
    FPH+3F_peak(+) SigFPH+3SIGF_peak(+)
    FPH-3F_peak(-) SigFPH-3SIGF_peak(-)
    FPH+4F_hrm(+) SigFPH+4SIGF_hrm(+)
    FPH-4F_hrm(-) SigFPH-4SIGF_hrm(-)

    Check that the output MTZ file is given as

    Output MTZ TEST gere_MAD_nat_prephadata1.mtz

  5. Now fill in the folder Anomalous Data as follows:

    Data set 1 collected at wavelength 0.981 with estimated F'-6.0 and F"2.0
    Data set 2 collected at wavelength 1.1 with estimated F'-3.0 and F"3.0
    Data set 3 collected at wavelength 0.98 with estimated F'-4.0 and F"4.0
    Data set 4 collected at wavelength 0.9 with estimated F'-3.0 and F"1.0

    In fact, the wavelengths are only used as labels by the program. The important values are f' and f" although the results are not very sensitive to the exact value. These values have been estimated from the known range of values of f' and f" for Se, and the relative dispersive and anomalous differences estimated in the previous section. You can plot an approximate distribution of f" and f' with wavelength for the different elements using CROSSEC.

  6. You should not need to change anything else, so select Run -> Run Now.

4c) Find heavy atoms

The Problem

You have generated a column of E values which give a wavelength-independent measure of the anomalous scattering due to the Se sites. The Se sites can be found from the E values by Patterson methods, but here you will use a direct methods approach.

Exercise

  1. Select the Experimental Phasing module, and open the Rantan - Direct Methods task window.

  2. On the first line, enter a suitable job title such as

    Job title Find Se sites for GerE (mad tutorial step 200).

  3. On the second line, select

    Set optimal Rantan parameters for estimates of FH or FA data (default)

    and on the next line, select

    generate map(s) and coordinate file listing peaks (default)

  4. Select the input MTZ file

    MTZ in TEST gere_MAD_nat_prephadata1.mtz

    The rest of the necessary information (in the 'Files' folder and the 'Running Rantan' folder) should be filled in automatically.

  5. You should not need to change anything else, so select Run -> Run Now.

  6. When the job has finished, view the output MTZ file by selecting in the main window View Files from Job -> gere_MAD_nat_rantan1.mtz. The output file has 48 columns:

     * Column Labels :
    
     H K L F_infl(+) SIGF_infl(+) F_infl(-) SIGF_infl(-) mod_F_infl(+)
     mod_SIGF_infl(+) mod_F_infl(-) mod_SIGF_infl(-) F_lrm(+) SIGF_lrm(+) F_lrm(-)
     SIGF_lrm(-) mod_F_lrm(+) mod_SIGF_lrm(+) mod_F_lrm(-) mod_SIGF_lrm(-) F_peak(+)
     SIGF_peak(+) F_peak(-) SIGF_peak(-) mod_F_peak(+) mod_SIGF_peak(+) mod_F_peak(-)
     mod_SIGF_peak(-) F_hrm(+) SIGF_hrm(+) F_hrm(-) SIGF_hrm(-) mod_F_hrm(+)
     mod_SIGF_hrm(+) mod_F_hrm(-) mod_SIGF_hrm(-) FM SIGFM F E SIGE F2OR E2OR PHASE1
     WT1 PHASE2 WT2 PHASE3 WT3

    RANTAN generates and refines a large number of possible phase sets (default 500), but only outputs the best ones (default 3) to the output MTZ file. These phases and the corresponding weights are held in the last 6 columns.

  7. From each of these phase sets, the task calculates a map and locates peaks, which may correspond to Se sites. These peaks are output in both orthogonal and fractional coordinates. Click on View Files from Job to reveal a list of output files. For each phase set, there will be a .pdb (orthogonal coordinates) and a .ha (fractional coordinates) file, for example TEST_jobnumber_1.pdb and TEST_jobnumber_1.ha for phase set 1. The default peak search produces approximately 15 peaks - we expect there to be 12 Se sites for this protein (2 each for 6 chains). (Note that RANTAN starts from random phase sets, so the results are not always the same.)

4d) Heavy atom refinement

The Problem

You now have 3 sets of possible Se sites. Heavy atom refinement and phasing is done using the program MLPHARE. The stages are:

  1. Refine heavy atom ( = Se) parameters (XYZ coordinates, B factor, real occupancy, anomalous occupancy), or a subset of these.
  2. Remove incorrect sites. Their occupancy will refine to a small or negative value. These will be obvious after refining against the centric data only; there are enough centric observations to refine the Se parameters, and the occupancies. If you have no centric data use the "Use every XXX-th reflection for refinement" option to use a subset of the reflections for refinement.
  3. Look for new atom sites using Fourier difference (or double difference) maps. These can be calculated after the preliminary centric refinement. The output MTZ file has phases for all the data to the requested resolution limit. A difference Fourier will contain all the present sites plus potential new sites; a double difference Fourier will only show potential new sites (and may therefore be more difficult to interpret). The peak height for a new site will be positive, but probably only 20-30% of the height of sites included in the refinement.
  4. MLPHARE provides useful graphs to monitor your progress. The most useful are those labelled: "Lack of Closure analysis v resln" and "Anomalous lack of closure v resln". The Cullis Rfactor and the Anomolous Cullis Rfactor should be less than 1, and any "improvement" to the sub-structure should reduce them. The Phasing power should increase as the solution improves.
  5. When you are satisfied that you have the complete solution, complete the refinement for the other wavelengths, and for the anomalous contributions, and generate the final phases.

To do this, you need to run MLPHARE several times. The steps using centric data or a subset of the full data set will be very fast. The Se parameters are held in a .ha file, which is updated after each pass. The output MTZ file will be used as input for the difference Fouriers.

For the tutorial, we just do the 1st stage (exercise 4d) and the last stage (exercise 4e). The intermediate stages are described at the end of exercise 4d.

Exercise

  1. Select the Experimental Phasing module, and open the Run Mlphare task window.

  2. On the first line, enter a suitable job title such as

    Job title Refining Se sites for GerE - set 1 (mad tutorial step 300).

  3. In the first folder, select:

    Use centric data only.

    There are enough centric observations to refine the Se parameters, and to indicate which sites are real, and which solution is best.

    Leave everything else unselected.

  4. Select the input MTZ file:

    MTZ in TEST gere_MAD_nat_scaleit1.mtz

    Now select the columns from the MTZ file. First we will refine the sites against the largest dispersive difference (See your SCALEIT summary for this).

    FPF_infl SigmaFPSIGF_infl
    FPH1F_hrm SigFPH1SIGF_hrm

    Check that the output MTZ file is given as

    Output MTZ TEST gere_MAD_nat_mlphare1.mtz

  5. In the folder Data Harvesting, leave as:

    Do not create harvest file

  6. In the folder Key parameters, enter resolution limits (we do not use the less reliable data for preliminary refinement).

    Resolution limit from 15.0 to 2.8.

  7. In the folder Describe Derivatives & Refinement, enter a name for the derivative:

    Phase with&refine derivative infl to hrm.

    On the next line, check you are refining (real) occupancy only. There is no anomalous signal for the centric data and in this polar space group the Y coordinate cannot be refined. The form factor is set to Ano which means the atomic form factor is given as a single electron and the final occupancy will represent the number of electrons of the dispersive difference for this pair of measurements.

    Use isomorphous data to refine occupancy (de-select 'refine XYZ' at this stage)

    Then select the file of heavy atom coordinates output by RANTAN. You can use your own file if you want, but it is recommended to use the prepared file in DATA:

    HA in DATA rantan_set1.ha

  8. Select Run -> Run Now.

  9. Click on View Files from Job -> TEST_jobnumber_1.ha and look at the list of refined sites. The occupancy of sites 7 and 12 are now negative (actual values may vary a little):

    ATOM7   Ano   0.244  0.156  0.961 -0.590 BFAC   20.000
    ATREF X ALL Y ALL Z ALL OCC ALL AOCC ALL B ALL
    ATOM12  Ano   0.216  0.274  0.077 -4.304 BFAC   20.000
    ATREF X ALL Y ALL Z ALL OCC ALL AOCC ALL B ALL
    

    and these sites should be deleted from the list. The easiest way to delete them is to click on these lines in the viewer window, which turns them into comment lines. Then click Save&Exit.

  10. Return to the Run Mlphare task window and rerun the refinement again without these sites. This will only take a few seconds. Also, select

    Generate difference maps and do peak search for more heavy atoms

  11. In the folder Describe Derivatives & Refinement, add in XYZ refinement:

    Use isomorphous data to refine XYZ and refine occupancy

  12. Update the heavy atom file:

    HA in TEST TEST_jobnumber_1.ha

  13. Select Run -> Run Now. The interface will ask you whether you want to overwrite gere_MAD_nat_mlphare1.mtz. This is OK, so click Delete File.

  14. When the job has finished, you can check the refined Se sites as before. Now you need to inspect the difference Fourier map to see if there are extra sites. The job will have output a file: TEST_jobnumber_F_hrm.ha, which looks something like this:

    GRID 114  68  76
    CELL  108.7420   61.6790   71.6520   90.0000   97.1510   90.0000
    ATOM    Ano   0.0792  0.1477  0.9868       28.76  0.0 BFAC  20.0
    ATOM    Ano   0.2051  0.3929  0.8592       27.99  0.0 BFAC  20.0
    ATOM    Ano   0.2597  0.0000  0.2456       27.02  0.0 BFAC  20.0
    ATOM    Ano   0.1827  0.0717  0.5238       26.64  0.0 BFAC  20.0
    ATOM    Ano   0.4297  0.1824  0.8797       26.55  0.0 BFAC  20.0
    ATOM    Ano   0.2825  0.2330  0.9221       25.53  0.0 BFAC  20.0
    ATOM    Ano   0.4660  0.2451  0.2418       24.44  0.0 BFAC  20.0
    ATOM    Ano   0.3405  0.1578  0.3168       22.12  0.0 BFAC  20.0
    ATOM    Ano   0.1336  0.1659  0.2071       13.98  0.0 BFAC  20.0
    ATOM    Ano   0.3213  0.3957  0.6309       11.62  0.0 BFAC  20.0
    ATOM    Ano   0.4934  0.1771  0.4131        6.57  0.0 BFAC  20.0
    ATOM    Ano   0.0722  0.2257  0.8064        6.18  0.0 BFAC  20.0
    ATOM    Ano   0.3587  0.0257  0.7882        5.93  0.0 BFAC  20.0
    ATOM    Ano   0.4096  0.1526  0.8472        5.61  0.0 BFAC  20.0
    ATOM    Ano   0.3738  0.1999  0.4861        5.19  0.0 BFAC  20.0
    ATOM    Ano   0.4491  0.2259  0.1904        4.08  0.0 BFAC  20.0
    ATOM    Ano   0.2776  0.0139  0.1304       -3.77  0.0 BFAC  20.0
    ATOM    Ano   0.3254  0.1712  0.4312       -3.55  0.0 BFAC  20.0
    ATOM    Ano   0.4751  0.2411  0.1694       -3.49  0.0 BFAC  20.0
    ATOM    Ano   0.3252  0.2054  0.4343       -3.47  0.0 BFAC  20.0
    ATOM    Ano   0.0179  0.1776  0.4259        3.42  0.0 BFAC  20.0
    ATOM    Ano   0.2697  0.2017  0.8671        3.35  0.0 BFAC  20.0
    ATOM    Ano   0.1271  0.4697  0.2386       -3.31  0.0 BFAC  20.0
    ATOM    Ano   0.2554  0.3257  0.2372        3.30  0.0 BFAC  20.0
    ATOM    Ano   0.2387  0.3913  0.9004        3.29  0.0 BFAC  20.0
    ATOM    Ano   0.4382  0.2585  0.3811       -3.28  0.0 BFAC  20.0
    ATOM    Ano   0.2398  0.1814  0.7578        3.25  0.0 BFAC  20.0
    ATOM    Ano   0.2036  0.0740  0.4036       -3.24  0.0 BFAC  20.0
    ATOM    Ano   0.2215  0.3875  0.0788        3.22  0.0 BFAC  20.0
    ATOM    Ano   0.3055  0.1610  0.4985       -3.13  0.0 BFAC  20.0
    ATOM    Ano   0.4253  0.2392  0.0122        3.06  0.0 BFAC  20.0
    ATOM    Ano   0.0751  0.1712  0.0262       -3.05  0.0 BFAC  20.0
    ATOM    Ano   0.2715  0.0100  0.1736       -3.02  0.0 BFAC  20.0
    

    Add these extra sites to the original output .ha file (or use this one, which was output by the difference Fourier calculations). You can edit the file to set all occupancies to a constant value. If the first 16 of these difference Fourier peaks are used, after refining XYZ and Occ, 3 peaks can quickly be eliminated again. In fact, the first (correct) 12 keep coming up as the strongest peaks.

  15. Once you are satisfied you have the complete solution, you need to do a refinement to the limit of the resolution to correct the B values. In the folder Describe Derivatives & Refinement, use XYZ refinement, and change the occupancy refinement to alternate occupancy and B factor:

    Use isomorphous data to refine XYZ and alternate Occ & B

  16. Select Run -> Run Now. The interface will ask you whether you want to overwrite gere_MAD_nat_mlphare1.mtz. This is OK, so click Delete File.

If you do not have time to do all the intermediate stages, you may skip to the final stage.... If you want to know what happens, read on:

There are several ways the refinement of the Se sites can be optimised:

Please note that these sites are not necessarily in the same asymmetric unit as the ones you have refined yourself, and most likely not in the same order. Keeping in mind that the spacegroup is C2, you can work out whether your set is 'correct'.

4e) Final Refinement and Phasing

The Problem

To get the best phases, we now include all wavelengths together, and use the anomalous signal as well. You need to do a final refinement and phasing run to first refine the anomalous occupancies (i.e. the relative number of electrons contributed by the different f"), then to output phases.

You will need four .ha files (variations of the four mentioned above). The first three you have refined using the F_infl with F_hrm, F_peak, and F_lrm. All these should have the same values for XYZ and B but different real occupancies. The anomalous occupancy value is 0.0. You now need to edit these to set the anom_Occ to 1.0, as a preliminary to refinement. You also need to copy one of these to provide a F_infl_v_F_infl.ha. Edit this to have real occupancies of 0.0 (anomalous occupancies of 1.0).

Exercise

  1. In the Run Mlphare task window, enter a suitable job title such as:

    Job title Refinement against all data (mad tutorial step 400).

  2. In the first folder:

    First de-select Use centric data only
    Then select Use anomalous difference data
    Then select Use every 10-th reflection for refinement

    and

    Apply calculated scale to output Sfs
    Output Hendrickson-Lattman coefficients

    Leave everything else unselected.

  3. Select the input MTZ file:

    MTZ in TEST gere_MAD_nat_scaleit1.mtz

    Now select the columns from the MTZ file.

    FPF_infl SigmaFPSIGF_infl
    FPH1F_infl SigFPH1SIGF_infl
    DPH1DANO_infl SigDPH1SIGDANO_infl
    FPH2F_lrm SigFPH2SIGF_lrm
    DPH2DANO_lrm SigDPH2SIGDANO_lrm
    FPH3F_peak SigFPH3SIGF_peak
    DPH3DANO_peak SigDPH3SIGDANO_peak
    FPH4F_hrm SigFPH4SIGF_hrm
    DPH4DANO_hrm SigDPH4SIGDANO_hrm

    Check that the output MTZ file is given as

    Output MTZ TEST gere_MAD_nat_mlphare2.mtz

  4. In the folder Key parameters, enter resolution limits to include all the data:

    Resolution limit from 15.0 to 2.6.

  5. In the folder Describe Derivatives & Refinement, sub-folder Derivative Number 1, enter a name for the derivative:

    Phase with&refine derivative infl to infl.

    On the next line, refine (anomalous) occupancy only, against anomalous data:

    Use anomalous data to refine occupancy

    Then:

    either select the file of correct sites that has been provided: or select the file:
    HA in DATA nat_sul_ref.ha HA in TEST F_infl_v_F_infl.ha
    and (through the 'View' and 'Edit Columns' buttons)
    set occupancies 0.0 and set anomalous occupancies 1.0
    then 'Save As..' F_infl_v_F_infl.ha on TEST, then 'Quit';
    then 'Browse' for the new filename
    as made according to the instructions above

  6. In the sub-folder Derivative number 2 select:

    Phase with&refine derivative infl to lrm.
    Use anomalous data to refine occupancy

    Then:

    either select the file of correct sites that has been provided: or select the file:
    HA in DATA nat_sul_ref.ha HA in TEST F_infl_v_F_lrm.ha
    and (through the 'View' and 'Edit Columns' buttons)
    set anomalous occupancies 1.0
    then 'Save As..' F_infl_v_F_lrm.ha on TEST, then 'Quit';
    then 'Browse' for the new filename
    as made according to the instructions above

  7. Repeat 405 for the other 2 wavelengths (Derivative number 3 and 4, 'infl to peak' and 'infl to hrm', respectively).

  8. Select Run -> Run Now.

  9. When the job has finished, return to the main window, highlight the job in the Job List, and select View Files from Job -> View Log Graphs. Graphs are given for each wavelength, for both the last refinement cycle and the final phasing cycle. Look in particular at:

    Lack of closure analysis .... / Phasing power ...., Lack of closure analysis .... / Cullis Rfactor ....
    For good data, the phasing power should be greater than 1, and the Cullis Rfactor should be significantly less than one. The values for the different wavelengths ("derivatives") should correlate with the f' value (compared to that for F_infl).
    Anomalous lack of closure analysis .... / Ano Cullis Rfactor .....
    The anomalous Cullis Rfactor should be significantly less than one. The values for the different wavelengths ("derivatives") should correlate with the f" value

In fact, the example data do not give very good statistics. However, the structure was solved by this method!

4f) Density Modification

The Problem

The phase statistics output by MLPHARE will be the same, whether the sites are on the correct or the wrong hand (see Maths notes). However if anomalous data has been used, one set of phases will give an interpretable map whilst the other will generate a random one. It is essential to select the correct hand before attempting interpretation of the map. This is done by using Density Modification procedures. For the correct hand it should be possible to see a boundary between the protein part of the asymmetric unit and the solvent, while for the wrong hand there will be no clear distinction. In fact, MLPHARE gives realistic Figures of Merit and therefore you need to generate phases on both hands (see Notes on usage). Density modification (also known as Density Improvement) can be done using the program 'dm'.

Exercise

  1. Select the Density Improvement module, and open the Run DM task window.

  2. On the first line, enter a suitable job title such as

    Job title DM on MAD phases - first hand (mad tutorial step 500).

  3. Select the input MTZ file:

    MTZ in TEST gere_MAD_nat_mlphare2.mtz

    Now select the columns from the MTZ file.

    FPF_native SIGFPSIGF_native
    PHIOPHIB_mlphare1 WeightFOM_mlphare1

  4. Enter the solvent content as

    Fraction solvent content 0.538.

  5. Everything else can be left as default, so Run -> Run Now.

4g) Testing the hand

The Problem

The procedure for locating the Se sites cannot distinguish between a particular set of sites and the same set of sites transformed through a point of inversion, i.e. it cannot distinguish the hand of the solution. Therefore, the previous phasing run should be repeated using the opposite hand. (The program ABS in CCP4 can also be used to determine the hand for the case of anomalous scattering.)

Then we look at two things:

  1. look at maps - one map should have a clearer solvent boundary than the other.
  2. run DM - one hand should give marginally better statistics than the other.

But the main difference is whether or not you can build a model ....

Exercise

Re-run the previous 2 exercises (4e and 4f), but using different files of sites.

  1. Select the job as run for exercise 4e (Final Refinement and Phasing) from the Job List and select ReRun Job.. from the menu on the right of the Main Window.

  2. Adapt the Job title line appropriately.

    Job title Refinement against all data opposite hand (mad tutorial step 600)

    Leave all input in the Protocol and Files folders as before.

  3. Check that the output MTZ file is given as

    Output MTZ TEST gere_MAD_nat_mlphare3.mtz

    (or change accordingly).

  4. In the folder Describe Derivatives & Refinement, sub-folder Derivative Number 1, adapt the HA in as follows:

    either select the file of correct (opposite hand) sites that has been provided: or select the file:
    HA in DATA nat_sul_opp_ref.ha HA in TEST F_infl_v_F_infl.ha
    and (through the 'View' and 'Edit Columns' buttons)
    set occupancies 0.0 and set anomalous occupancies 1.0
    then 'Save As..' F_infl_v_F_infl_opp.ha on TEST, then 'Quit';
    then 'Browse' for the new filename
    and (after the 'View' button)
    select Reverse hand, then OK (in Change Hand window),
    then 'Save As..' F_infl_v_F_infl_opp.ha on TEST, then 'Quit';
    then 'Browse' for the new filename (or add _opp)

  5. Similarly repeat steps 405 - 408.

  6. Select the job as run for exercise 4f from the Job List and select ReRun Job.. from the menu on the right of the Main Window.

  7. Adapt the Job title line appropriately.

    Job title DM on MAD phases opposite hand (mad tutorial step 606)

  8. Select the input MTZ file:

    MTZ in TEST gere_MAD_nat_mlphare3.mtz

  9. Everything else can be left as before, so Run -> Run Now.

  10. To check the maps, select the Map & Mask Utilities module, and open the Run FFT - Create Map task window.

  11. On the first line, enter a suitable job title such as

    Job title mlphare dm hand 1 (mad tutorial step 610)

    Then select (using the radiobutton)

    Plot map sections with no coordinates

  12. Select the MTZ file:

    MTZ in TEST gere_MAD_nat_dm1.mtz

    and select the columns from the MTZ file

    F1 F_infl Sigma SIGF_infl (FDM does not have a Sigma)
    PHI PHIDM Weight FOMDM

  13. In the folder Select Plot Selections:

    Plot sections on Y axis from 0.0 to 0.5 in steps of 5.0
    Map contour levels entered as a range of *sigma
    Contour levels from 1.5 to 2.5 by intervals of 0.5

  14. Select Run -> Run Now.

  15. Repeat with MTZ file gere_MAD_nat_dm2.mtz (the opposite hand).

  16. When the jobs have finished, select the first 'dm' job from the Job List and View Files from Job -> View Log Graphs. Try to get it side by side with the Log Graphs from the second dm job and compare. Have a look at Phase and weight statistics (both Mean change in phase and Mean figures of merit), and Density Modification-Free-R factors. All in all, the statistics for the first hand are slightly better. Also, from the Log File it can be seen that the solvent boundary looks better for the first hand.

    Select the first FFT job from the Job List and View Files from Job -> TEST_<jobnumber>_1.plt and bring up Picture 1. Try to get it side by side with Picture 1 from the .plt of the second FFT job. The solvent area looks a lot 'cleaner' for the first hand, whereas the contrast is less for the second hand.


On to the next tutorial - Molecular Replacement.

Back to the previous tutorial - Experimental Phasing (by MIR).

Back to the index.


Prepared by Liz Potterton and Martyn Winn, 2000

Adapted by Eleanor Dodson and Maria Turkenburg, 2002-2003

Valid CSS! Valid XHTML 1.0!