CCP4 Tutorial: Session 6 - Refinement

See also the accompanying document giving background information.

This is an introduction to the procedures for using Refmac5 to refine a crystal structure.

See also the documentation for Refmac5 and Sketcher.

In the following instructions, when you need to type something, or click on something, it will be shown in red. Output from the programs or text from the interface is given in green.

Outline of the Method

  1. Use Refmac to find any special restraints for the unliganded structure - for example the disulphide bonds and cis-peptides. Check the results and make sure the special restraints listed in the log file are correct.
  2. Run the Refmac program to refine the unliganded structure and look at the result
  3. Create the geometry description of the ligand
  4. Use Refmac to find any more special restraints for the liganded form
  5. Run the Refmac program again to refine the liganded structure and look at the result

The Problem

This example is to refine the protein RNAse Sa in its unliganded and liganded form, for which we know:

The final structure (solved by Joseph Sevcik: J. Sevcik, Z. Dauter, V.S. Lamzin, K. Wilson, Acta Cryst. D52 (1996) p327-344) looks like this:

RNAse SA crystal structure

What are Restraints?

There are 1749 atoms in the asymmetric unit. If we describe each atom using three positional parameters x,y,z and an isotropic temperature factor B there are 6996 parameters. In the experimental data (see $DATA/rnase18.mtz) there are 17991 reflections giving an observation-to-parameter-ratio of 17991/6996 = 2.57. This is not enough to refine all parameters as independent variables. However we have a great deal of information about the geometry of molecules - the bond lengths and bond angles etc. The refinement program will set up restraints between related atoms which say, for example, that the distance between two bonded atoms must be close to the ideal bond length.

Refinement programs read libraries describing the expected geometry. These contain information about the ideal bond lengths, bond angles, planar groups etc. for the common chemical monomers which are found in macromolecules (e.g. amino acids and nucleic acids). If this information is incorrect you will not get a correct structure.

If your structure contains an unusual substrate molecule or a modified amino acid, then there may be no suitable description in the library, and you will have to provide one in the required style. This can be difficult; it is essential you know the chemical definition of your ligand, (e.g. which atoms lie in a plane, which bond type exists, etc). This knowledge must then be written in the correct format for the refinement program to read. This can also be challenging, so we will make the geometry description for the ligand to show how it is done.

The Data Files

Files in directory DATA:

rnase.pdbThe refined unliganded protein coordinates
GMP.pdbIdealised coordinates of the ligand molecule, guanosine-3*-monophosphate
rnase-3gp.pdbThe refined protein-ligand coordinates
rnase_bad.pdbA test set of coordinates with errors introduced.
rnase18.mtzAn MTZ file containing three sets of experimental data extending to 1.8Å, labelled FNAT SIGFNAT F3GP SIGF3GP F2GP SIGF2GP for unliganded protein, the protein liganded with guanosine-3*-monophosphate (3GP) and with guanosine-2*-monophosphate (2GP), respectively
rnase115.mtzThe experimental data and calculated structure factors and phases for the unliganded protein to 1.15Å
rnase25.mtzAn MTZ file containing experimental data (including anomalous), extending to 2.5Å, for the native protein and three derivatives (mercury, platinum and iodine), as used for experimental phasing by MIR and experimental phasing by MAD

The output files in directory RESULTS:

3GP_mon_lib.cifThe monomer library description of 3GP
review.logThe log file from running Refmac5 to review restraints
rnase18_bad_refmac1.pdbThe output file from running Refmac5 to review restraints
refmac-unliganded.log.log from Refmac5 refinement of unliganded RNAse

6a) Check model geometry, create and review special restraints

For the tutorial a file called $DATA/rnase_bad.pdb bas been generated, which has had some errors introduced. The residue ARG B:63 is now too close to ARG A:63. Also, all residues have been shifted by up to 0.2Å.

bad position for arg b63

We will use REFMAC5 in review restraints mode. The program will look at the atom coordinates and decide where there are disulphide bonds, cis-peptides and D-peptides. It will also calculate the distance between atoms and if they are very close it will assume the atoms are bonded and will make a make a restraint to say 'these two atoms must stay close'. Of course this is not always right. It will also add any absent atoms - if a residue does not have the right atoms it will make them. REFMAC5 will help you by finding the disulphide bonds etc. automatically but you MUST check that they are correct.

Exercise

  1. From the Refinement module select Run Refmac5.

  2. In the Protocol folder, enter a Job title such as:

    Job title Review restraints on rnase with bad geometry (refine tutorial step 1)

    Then

    Do review restraints

  3. Now select the input coordinate file:

    PDB in DATA rnase_bad.pdb

  4. Look in the folder called Setup Geometric Restraints. In here you can decide what to look for in the PDB file. We will use the defaults - you do not need to change anything.

  5. Now run the REFMAC5 program. From the Run menu at the bottom of the window choose Run Now.

  6. The job will take a little time. When it has finished, the job status will be "FAILED" - this is caused by WARNINGs (see below) which should be considered serious, but which are part of the learning process here. Look at the log file (click on the name of the job, refmac5, in the main window and use View Files from Job and View Log Files).

    Some interesting things in the log file:

     WARNING : CIS peptide bond is found, angle =      16.01
        ch:AA res:  26  GLY    -->  27  PRO
                                      ....
    
     WARNING : link:SS       is found dist =     2.211 ideal_dist=     2.031
                ch:BB res:   7  CYS      at:SG  .->BB res:  96  CYS      at:SG

    These things are correct, REFMAC5 has checked the input protein molecule and found some cis peptide bonds and some disulphide bonds, but there is also:

     WARNING : description of link:ARG-ARG  not found in the dictionary.
         link will be created with bond_lenth =   1.400

    This is not true, REFMAC5 will suggest making bonds between the residues which are too close.

  7. Now look at the output PDB file: use View Files from Job and select rnase_bad_refmac1.pdb (this file can also be found in the RESULTS directory).

    At the top of this file is new information:

    LINK        NH1_ ARG A  40                NE__ ARG B  63                ARG-ARG
    LINK        NH1_ ARG A  40                CZ__ ARG B  63                ARG-ARG1
    LINK        NH1_ ARG A  40                NH2_ ARG B  63                ARG-ARG2
    LINK             GLY A  26                     PRO A  27                PNCIS
    SSBOND   1 CYS A   96    CYS A    7
    LINK             GLY B  26                     PRO B  27                PNCIS
    SSBOND   2 CYS B   96    CYS B    7

    Now Quit from the window.

  8. It is necessary to edit the PDB file to remove the bad link information. There is an easy way to do this. From the Refinement menu in CCP4i main window select Edit Restraints in PDB.

  9. Select the input file:

    PDB in TEST rnase_bad_refmac1.pdb

    Wait while the program reads the file.

  10. In the window you will now see:

    The space group and the symmetry operators for the space group (you may need this information to define disulphide bonds or links between molecules that are not in the same asymmetric unit).

    The MODRES IDs and LINK modes provide additional information to describe the molecule geometry. Definitions of MODRES allow you to modify a standard residue description, e.g. to rename a monomer, or to modify MET to include Se - the details are discussed in the Refmac5 documentation. LINK definitions describe ways to link two monomers, e.g. peptides co-valently linked to substrates.

    MODRES - Modified Residues

    This is a way to redefine non-standard residues. There is a monomer labelled GMP in the RNAse coordinate file, which matches the dictionary definition of 3GP.

    SSBOND - Disulphide Bonds

    The two disulphide bonds in RNAse are shown.

    LINK - Inter-Residue Bonds

    The three bad bonds are listed.

    CISPEP - Cis peptides

    The two cis-peptides in RNAse are listed.


  11. You can delete the bad links by clicking on the menu Edit Table and selecting Delete Last Row. Do this three times.

    It is also possible to add new things - try clicking on Add Row.

  12. We will stop using the rnase_bad coordinates, so you do not need to save changes - Close the window.

6b) Refining the Unliganded Molecule

Exercise

  1. From the Refinement module select Run Refmac5.

  2. Enter a suitable job title such as

    Job title restrained refinement for unliganded RNAse (refine tutorial step 100)

    Then

    Do restrained refinement using no prior phase information input

    Also you will see:

    square box Generate weighted difference maps in CCP4 format.

    If you have a graphics program to look at the maps then click this on and select a map format.

  3. Now select the input files - the experimental data:

    MTZ in DATA rnase18.mtz

    and make sure you have correct data columns:

    FP FNAT SIGFP SIGFNAT

    and the coordinate file:

    PDB in DATA rnase.pdb

  4. Click on Run -> Run Now.

  5. The job will take a little time. When it is finished look at the log file (click on the name of the job, refmac5, in the main window and use View Files from Job and View Log Graphs).

    If you do not have a log file then click on View Any File and set:

    Go to directory RESULTS
    File type log CCP4 log filename filter *.log
    Viewer View Log Graphs

    and then select file:

    File refmac-unliganded.log

    Go to the last table in the Tables in File and click on:

    Rfactor analysis, stats vs cycle

    You will see a graph of the R factor and the Free R factor for the 6 cycles of refinement. The R factor is very good already but both go down a little.

    Rfactor Analysis

    Also look at the Graphs in Selected Table for:

    FOM vs cycle
    -LLG vs cycle
    Geometry vs cycle

    The FOM tells you how well the molecule matches the experimental data and the Geometry tells you how well the molecule obeys the geometry restraints.

    Also, slightly up the Tables in File list, select the last:

    Cycle 6. Rfactor analysis, F distribution v resln

    This is information about the last cycle of refinement. Have a look at:

    <Rfactor> v. resln
    Rfactor Analysis

    The red line is the average R factor versus resolution for the data which is used and the green line is the Free R factor (for the 'free' data which is not used). This is similar across the resolution ranges - it does not go up for high resolution data. This is an example of what is good about maximum likelihood refinement compared with the old-fashioned least squares.

    Also look at the graph:

    <Fobs> and <Fc> v. resln
    Structure Factor Analysis

    This is a graph of the average observed structure factors and calculated structure factors. You notice that at low resolution the observed (red) and the calculated (blue) are not the same. At low resolution the water atoms, which we can not see in the crystal structure, are an important part of the structure factors. The refinement program tries to model the water atoms by solvent scaling but it is difficult for this data because some of the very low resolution data is missing.

    To close the loggraph window click on the File menu and select Exit.

  6. Look at the header of the output MTZ file - click on View Files from Job and select the file rnase18_refmac1.mtz. In the file you will see:

    * Column Labels :
    
     H K L FNAT SIGFNAT FreeR_flag FC PHIC FWT PHWT DELFWT PHDELWT FOM

    The new data in the file is:

    FC & PHIC the structure factors and phases calculated from the final coordinates
    FWT & PHWT the 'best' structure factors and phases weighted by the maximum likelihood function
    DELFWT & PHDELWT the 'best' structure factors and phases for a difference map
    FOM figure of merit for PHIC

    If you selected the option to create output maps then you can look at the maps created from the REFMAC output.

    ...FWT.mapthe 'best' weighted map
    ...DELFWT.mapthe 'best' weighted difference map

    An example of these maps is shown below for a tyrosine residue which is in the wrong place. The DELFWT map is the weighted difference map of F(observed) - F(calculated) and looks like this:

    Difference map

    Here you can see a large pink area of negative density where the tyrosine side chain is now. This is saying that the side chain should not be here. The large brown-red area of positive density is showing where the side chain should be.

    The FWT map is the weighted map and looks like this:

    weighted map

    You can see region of density to the left of the tyrosine which is where it should go.

6c) Create a Monomer Library Entry for the ligand 3GP

There are three ways to get a geometry description:

  1. There are over 2000 molecules in the REFMAC library so your ligand may already be in the library. In fact there is a full monomer description of the ligand in our example in the library, but that is too easy. We will just have a look at it.
    A BIG WARNING: the data in the library is from crystal structures in the PDB database - these may not be correct or exactly the same as your ligand so always check the bond definition, chiral centres, planar groups etc.
  2. If you have coordinates for the ligand in a PDB file, it is possible to make a monomer library entry very quickly - we will try doing this.
  3. If you have no coordinates for the ligand you need to draw the molecule after which the programs will make a geometry description and will also make a PDB file with coordinates. This can be made easier if there is a similar molecule in the library - you can get this molecule from the library and edit - we will also try doing this.

ad i) Exercise - Looking at the Monomer Library Entry for the ligand

  1. From the Refinement module select the Monomer Library Sketcher task.

  2. From the File menu at the top of the window select Read File and from the next menu select Load Monomer from Library.

  3. In the Load Monomer from Library window, in the Choose Monomer folder, search:

    List non-polymer monomers - apply search filter: guanosine

    and select 3GP GUANOSINE-3*-MONOPHOSPHATE.

  4. Select Run -> Run Now, and Close.

  5. You will see the molecule displayed. You can rotate it by holding down the left mouse button. On the right of the window is a list of atoms - this list has the element, the atom name and the oxidation state (the charge of the atom). Below the list of atoms is a list of the chiral centres found in the molecule, of which there are four.

  6. Now look at the monomer library file. In the Main Window select the last job which is called load_monomer. Now select the View Files from Job menu (on the right side of the main window) and select the file 3GP_mon_lib.cif.

    In this file you will see a list of the atoms.

    _chem_comp_atom.comp_id
    _chem_comp_atom.atom_id
    _chem_comp_atom.type_symbol
    _chem_comp_atom.type_energy
    _chem_comp_atom.partial_charge
    _chem_comp_atom.x
    _chem_comp_atom.y
    _chem_comp_atom.z
     3GP           O6     O    O         0.000      0.000    0.000    0.000
     3GP           C6     C    CR6       0.000      0.831    0.906    0.061
     3GP           C5     C    CR56      0.000      2.177    0.611    0.120
    ......

    Further down the file is the list of bonds:

    _chem_comp_bond.comp_id
    _chem_comp_bond.atom_id_1
    _chem_comp_bond.atom_id_2
    _chem_comp_bond.type
    _chem_comp_bond.value_dist
    _chem_comp_bond.value_dist_esd
     3GP      C6     O6        aromatic    1.230    0.020
     3GP      C6     N1        aromatic    1.380    0.020
     3GP      C5     C6        aromatic    1.390    0.020
     3GP      C4     C5        aromatic    1.390    0.020
    ......

    The words at the top of the list tell you what is in each column:

    comp_id
    this is the compound id - this is always 3GP.
    atom_id_1
    this is the first atom in the bond
    atom_id_2
    this is the second atom in the bond
    type
    this is bond type (single, double, aromatic, deloc)
    value_dist
    this is the ideal bond distance
    value_dist-esd
    this is the standard deviation for the ideal bond distance

    There is similar information on bond angle, torsion angle, chirality and planar groups.

    The refinement program will try to make the ligand as defined in this file - you can edit the file if you need to.

ad ii) Exercise - Creating a Monomer Library Entry from Coordinate File

  1. From the Refinement module select the Monomer Library Sketcher task.

    If you already have sketcher open from the previous exercise, delete any molecule that you have displayed: from the Edit pull down menu select Delete All Atoms.

    For the next steps see the view_sketcher_1 picture.

  2. From the File menu at the top of the window select Read File and from the next menu select Read PDB file. Select the file:

    Go to directory DATA
    File GMP.pdb

    If there is a message querying the use of an existing library file (using LIBCHECK), click No.

  3. You will see the molecule displayed. You can rotate it by holding down the left mouse button. On the right of the window is a list of atoms - this list has the element, the atom name and the oxidation state (the charge of the atom). Below the list of atoms is a list of the chiral centres found in the molecule, of which there are four.

  4. The picture of GMP below shows the correct delocalised and aromatic bonds - edit your molecule accordingly. To change a bond to a delocalised bond, you must hold down the Shift key on the keyboard and click on the bond with the right mouse button. It will step through single--double--triple--deloc--aromatic--metal.

    correct bond structure for GMP

  5. Now create the monomer library. From the File pull-down menu at the top of the window select Create Library Description.

    In the Create Dictionary Entry window, enter a job title such as

    Job title library description from DATA GMP.pdb (refine tutorial step 210)

    Select

    Create description from displayed structure

    Then enter the name of the ligand (this must be GMP which is the name of the ligand in the PDB file which we will refine). So:

    Unique identifier GMP Full name: guanosine-3*-monophosphate

    The names of files will be created automatically so you can then select Run -> Run Now. Then Close this window.

    You must wait a little time - a program called LIBCHECK is running. When it has finished the molecule is displayed again.

  6. Now look at the new monomer library file. From the Main Window select the last job which is called dictionary. Select the View Files from Job menu (on the right side of the main window) and select the file GMP_mon_lib.cif.

    In this file you will see a list of the atoms.

    _chem_comp_atom.comp_id
    _chem_comp_atom.atom_id
    _chem_comp_atom.type_symbol
    _chem_comp_atom.type_energy
    _chem_comp_atom.partial_charge
    _chem_comp_atom.x
    _chem_comp_atom.y
    _chem_comp_atom.z
     GMP           O31    O    OP       -0.660      0.000    0.000    0.000
     GMP           P3     P    P         0.000      1.175    0.940    0.095
     GMP           O33    O    OP       -0.660      1.871    0.745    1.419
    ......

    Further down the file is the list of bonds:

    _chem_comp_bond.comp_id
    _chem_comp_bond.atom_id_1
    _chem_comp_bond.atom_id_2
    _chem_comp_bond.type
    _chem_comp_bond.value_dist
    _chem_comp_bond.value_dist_esd
     GMP      P3     O31       deloc       1.510    0.020
     GMP      O32    P3        deloc       1.510    0.020
     GMP      O33    P3        deloc       1.510    0.020
     GMP      O3*    P3        single      1.610    0.020
    ......

    This list is not quite the same as for the monomer 3GP as it is in the Monomer Library. Have a look at the differences and update as you see fit.

ad iii) Exercise - Creating a Monomer Library Entry by Drawing the Molecule

This section is optional. Alternatively you can go directly to the next step - Review Special Restraints for ligand.

If you have no coordinates or other definition of the ligand then you must draw the molecule in the Sketcher. Sometimes there may be a similar molecule in the library - you can start from this and edit it. There is a guanosine molecule in the library which we can use to make GMP (or, more accurately, 3GP).

  1. Delete any molecule that you have displayed: from the Edit pull down menu select Delete All Atoms.

  2. From the File pull-down menu select Read File and then Load Monomer from Library. In the new window, from the folder Choose Monomer, select:

    List RNA monomers

    Now you will see a list of RNA monomers. Click on the line:

    Gr   Guanosine

    Then click on Run -> Run Now and Close the Load Monomer from Library window.

    You must wait a little while before the molecule of guanosine is displayed. To make 3GP you must delete the phosphate group on the O5* and draw a phosphate group on O3*.

    For the next steps see the view_sketcher_2 picture.

  3. Make sure that the Mouse Mode (on the left of the Sketcher window) is set to Edit Monomer.

  4. From the edit tools on the left of the Sketcher window select the 'Delete atom' icon Delete atom icon from the edit tools on the left of the window.

  5. Hold down the Shift key and click with the left mouse mutton on the atoms O3T, O2P, O1P and P to delete them.

  6. To add the new phosphate group select the the 'Add a C atom' icon Add a C atom icon from the edit tools. The atoms that you add will be carbon atoms - you will change them later to phosphorus and oxygen.

    Make O3* the active atom by holding down the Control key and clicking on it with the right mouse button. It is now the flashing, active atom. Add a new atom by holding down the Shift key and clicking with the left mouse button close to the O3* at the place where the new atom should go. You now have an atom called C21 and it is the active atom. Click close to this atom with Shift - left mouse button to make one more atom. Add two more atoms. You will need to make the C21 atom the active atom (Control - right mouse button) and then add the atom for each of those. When you have finished adding atoms, click on the Do nothing edit tool at the top left of the window; now you will not make more atoms by mistake.

  7. Now look at the end of the table on the right side of the Sketcher window. The new atoms are C21, C22, C23 and C24, and each atom has the elment type C. Change C21 to a P and the other three to O. The atom names are also wrong - change the names to P, O1P, O2P and O3P.

  8. The bonds within the phosphate are delocalised bonds, so click Shift - right mouse button on the bonds between P and all three O atoms, stepping through the bonds until they are 'deloc' bonds.

  9. Now we create the monomer library. From the File pull-down menu at the top of the window select Create Library Description. In the window enter the name of the ligand (call this TEST3GP so you do not overwrite the files you made before). So:

    Unique identifier TEST3GP Full name: guanosine-3*-monophosphate

    The names of files will be created automatically so you can then click Run -> Run Now. Close this window.

  10. Wait while the program runs to build the dictionary file. The molecule is drawn again. If necessary you can make corrections and run again. To close the Sketcher window, select the File pull-down menu and Close Sketcher.

6d) Review Special Restraints for ligand

We will add the ligand to our refined unliganded structure. In order to see what might happen, a file called $DATA/rnase-3gp.pdb is provided, with the refined unliganded model plus 3GP.

We will use REFMAC5 in review restraints mode. The program will look at the atom coordinates again. The disulphide bonds, cis-peptides and D-peptides are already defined. It will also calculate the distance between atoms and if they are very close it will assume the atoms are bonded and will make a make a restraint to say 'these two atoms must stay close'. Of course this is not always right.

Exercise

  1. From the Refinement module select Run Refmac5.

  2. Enter a suitable job title such as

    Job title review restraints for liganded RNAse (refine tutorial step 300)

    Then

    Do review restraints

  3. Select the input coordinate file:

    PDB in DATA rnase-3gp.pdb

  4. Select the new library file that you made for 3GP (this can also be found on the RESULTS directory):

    Library TEST 3GP_mon_lib.cif

  5. Now run the REFMAC5 program. From the Run menu at the bottom of the window choose Run Now.

    If all is well, the program should run without any warnings, apart from those about hydrogens. Now we are ready for some real refinement.

6e) Refining the Liganded Molecule

Now we will use the monomer library description that we created to refine the rnase molecule with the 3GP ligand.

Exercise

  1. From the Refinement module select Run Refmac5.

  2. Enter a suitable job title such as

    Job title restrained refinement for liganded RNAse (refine tutorial step 400)
  3. Then

    Do restrained refinement using no prior phase information input

    Also you will see:

    square box Generate weighted difference maps in CCP4 format.

    If you have a graphics program to look at the maps then click this on and select a map format.

  4. Now select the input files - the experimental data:

    MTZ in DATA rnase18.mtz

    and make sure you have correct data columns:

    FP F3GP SIGFP SIGF3GP

    and the coordinate file:

    PDB in DATA rnase-3gp.pdb

    To use the geometry description file which you have made:

    Library TEST 3GP_mon_lib.cif

  5. Click on Run -> Run Now.

  6. The job will take a while. When it is finished look at the graphs from the log file (click on the name of the job, refmac5, in the main window and use View Files from Job and View Log Graphs).

    Go to the last table in the Tables in File and click on:

    Rfactor analysis, stats vs cycle

    You will see a graph of the R factor and the Free R factor for the 6 cycles of refinement. The R factor is very good already but both go down a little.

    Rfactor Analysis

    To close the loggraph window click on the File menu and select Exit.


Back to the previous tutorial - Molecular Replacement.

Back to the index.


To find out more:
Refmac: http://www.ysbl.york.ac.uk/~garib/refmac
Libcheck: http://www.ysbl.york.ac.uk/~alexei/libcheck.html
CCP4: http://www.dl.ac.uk/CCP/CCP4

Prepared by Liz Potterton (lizp@ysbl.york.ac.uk) and Eleanor Dodson, July 2000
Adapted by Maria Turkenburg, 2002-2003

Valid CSS! Valid XHTML 1.0!