CCP4 Tutorial: Session 1 - Introduction

See also the accompanying document giving background information.

In the following instructions, when you need to type something, or click on something, it will be shown in red. Output from the programs or text from the interface is given in green.

Outline of the Method

  1. Setting up Project and Directory Aliases
  2. Introduction to the MTZ format
  3. MTZ format: unmerged files
  4. The Loggraph Utility

The Data Files

Directory DATA contains input files:

toxd.hklreflection file from X-PLOR/CNS
aucn.na4reflection file in NA4 format
toxd.pdbcoordinate file of TOXD

Directory RESULTS contains selected output files (you can look at these if you have problems, or the job is too slow):

toxd.mtzreflection file in CCP4 format
import-cns.log.log of importing CNS reflection file into CCP4
import-cns.defCCP4i .def of importing CNS reflection file into CCP4
import-unmerged.log.log of importing an unmerged reflection file into CCP4
import-unmerged.defCCP4i .def of importing an unmerged reflection file into CCP4

You will work in your own directory TEST.

If you have problems following the instructions, then you can use .def files in directory RESULTS which contain the necessary parameters. You can load these files into the interface using the option at the bottom of the task window Save&Restore -> Restore from File -> select the file.

Often you will use the output file of one job as the input file for the next job. However, if you do not have the output file, then it will also be available in directory DATA.

1a) Setting up Project and Directory Aliases

The Problem

When using ccp4i for the first time, you need to set up a project to work in. You also need to define directories so that ccp4i knows where to find files.

Exercise

  1. In your home directory, make a subdirectory TEST:

    > mkdir TEST

  2. Start ccp4i:

    > ccp4i

    The Main Window will appear.

  3. If this is the first time that you have run ccp4i then the Directories&ProjectDir window will appear automatically. Otherwise, click on the Directories&ProjectDir button in the main window to launch this window.

  4. In the new window, click on Add Project and in the new line enter a project alias TEST and then enter the the full path name for the subdirectory TEST that you have just made:

    Project TEST uses directory: $HOME/TEST

  5. Select this new project on the next line:

    Project for this session of CCP4Interface TEST

  6. Click on Add Directory Alias and in this new line add the the directory alias DATA and the path name:

    Alias: DATA for directory: $CEXAM/tutorial/data

  7. Repeat the previous step to add the alias RESULTS:

    Alias: RESULTS for directory: $CEXAM/tutorial/results

  8. Click on Apply&Exit.

1b) Introduction to the MTZ format

The Problem

The MTZ file format is central to running the CCP4 programs. When using CCP4 for the first time, you will usually have to convert an external file to MTZ format. You also need to understand how information is arranged in an MTZ file. In this example, we convert a CNS reflection file for the protein toxd to MTZ format, and briefly examine the MTZ file.

Exercise

  1. Find the Choose module pull-down menu in the main window, and select Reflection Data Utilities.

  2. In the Tasks menu below, click on Convert to/modify/extend MTZ. This will open a Task window.

  3. On the first line, enter a suitable job title such as

    Job title Importing CNS file for toxd (intro tutorial step 20)

  4. On the second line, select X-PLOR/CNS from the pull-down menu. Wait while the task window re-draws itself.

  5. On the 3rd line, select:

    Create full unique set of reflections and keep existing FreeR data.

    by checking that the radiobutton on the left-hand side is on (this is the default), and selecting the appropriate option from the pull-down menu.

  6. Now enter the input CNS file as:

    In DATA toxd.hkl

    (you can use the Browse button after selecting DATA from the pull-down menu). The output file should be automatically set to:

    Out TEST toxd.mtz

    (if it is not, type this yourself).

  7. Now look at the folder MTZ Project, Crystal & Dataset Names. These names will be used to identify the data for Data Harvesting and to categorise the data within MTZ data structures. Enter:

    Crystal wildtype belonging to Project toxd
    Dataset name native

  8. Now look at the folder Cell and Spacegroup to be saved in MTZ file. We need to supply the spacegroup and cell dimensions, since these are not included in the input CNS file. Click on the grey title bar of the folder and enter:

    Space group name or number 19
    Cell dimensions a 73.582 b 38.733 c 23.189 alpha 90 beta 90 gamma 90

  9. Now look at the folder Detailed specification of file format. The format of X-PLOR/CNS files is variable, and we need to make sure that the task is able to read the format of the input file correctly. If an incorrect format statement is given, the task fails with an error such as: " f2mtz: problems reading reflection 0 ". In the case, the default format statement is slightly wrong, and needs to be changed to:

    Fortran format '(6X,3F5.0,6X,F10.3,10X,7X,F10.3,6X,F10.0)'

    i.e. add 10X, in front of 7X (this takes into account the column for the imaginary component of the input F).

  10. The remainder of the task window can be left unchanged, so go to the bottom of the task window and click on Run -> Run Now.

    Look at the main window of the interface again, and look at the Job List. The current Import job should be at the top. The status will be given as STARTING, RUNNING and then FINISHED. This job is very quick, so you may only see the FINISHED status.

  11. When the job has finished, highlight the job in the job list by clicking on it. Then select View Files from Job -> toxd.mtz in the main window.

    (Alternatively, right-click on the job in the job list, go to the Input and output files.. option in the menu and select toxd.mtz there.)

    A window will open displaying the contents of the MTZ file that you have created (the MTZ file is a binary file, so you are actually just seeing the output of a viewer program). The information that is displayed comes from the header of the MTZ file. Look for the following:

     * Dataset ID, project/crystal/dataset names, cell dimensions, wavelength:
    
            1 toxd
              wildtype
              native
                 73.5820   38.7330   23.1890   90.0000   90.0000   90.0000
                 0.00000

    Information about the datasets included in the file is given here. In this example, the file just contains one dataset.

          * Column Labels :
    
                H K L FP SIGFP FreeRflag

    The file contains 6 columns; 3 holding the hkl indices, and 3 containing data. The names of these columns are given here. In the MTZ format, the column names are not fixed, and neither is the order of the columns. Programs use these names to identify the columns that are to be used.

          * Column Types :
    
               H H H F Q I

    Each column has an associated type. For example, F refers to a structure factor amplitude: the column FP has this type.

          * Associated datasets :
    
               0 0 0 1 1 1

    This is a list of the datasets associated with each column. In this example, all columns belong to dataset 1.

     * Cell Dimensions : (obsolete - refer to dataset cell
    dimensions above)
    
       73.5820   38.7330   23.1890   90.0000   90.0000   90.0000 
    
     *  Resolution Range :
    
        0.00085    0.18900     (     34.280 -      2.300 A )
    
     * Sort Order :
    
          1     2     3     0     0
    
     * Space group = 'P 21 21 21' (number     19)

    The cell dimensions, resolution range and space group are carried in the MTZ file header, so that you do not normally need to enter them explicitly when running programs.

  12. By default, only the header information from the MTZ file is displayed. To see more, click on List More Info at the bottom of the display window. A dialogue box will appear. Accept the defaults and click Apply&Exit. Extra information is now displayed at the bottom of the display window. Scroll down and look at the table:

     OVERALL FILE STATISTICS for resolution range   0.001 -   0.189
     ======================= 
    
    
     Col Sort    Min    Max    Num      %     Mean     Mean   Resolution   Type Column
     num order               Missing complete          abs.   Low    High       label 
    
       1 ASC      0      31      0  100.00     11.9     11.9  34.27   2.30   H  H
       2 NONE     0      16      0  100.00      6.2      6.2  34.27   2.30   H  K
       3 NONE     0      10      0  100.00      3.6      3.6  34.27   2.30   H  L
       4 NONE  183.0 20154.0    74   97.71  2852.63  2852.63  34.27   2.30   F  FP
       5 NONE   18.0   465.0    74   97.71   140.35   140.35  34.27   2.30   Q  SIGFP
       6 NONE    0.0    22.0     0  100.00    11.59    11.59  34.27   2.30   I  FreeRflag
    
    
     No. of reflections used in FILE STATISTICS     3234

    Each line corresponds to a column of data in the MTZ file, and for each line various statistics are given. For example, Num Missing gives the number of reflections in that column which have been flagged as missing data (e.g. a structure factor amplitude which wasn't measured in the diffraction experiment).

  13. At the bottom of the display, the first 10 reflections are listed (more can be listed via the List More Info option):

        0   0   2      626.00    112.00      3.00
        0   0   4     9111.00    168.00     22.00
        0   0   6      513.00    146.00     20.00
        0   0   8     2610.00     52.00     10.00
        0   0  10         ?         ?       11.00
        0   1   1     1200.00     38.00     13.00
        0   1   2     2244.00     55.00     21.00
        0   1   3     2163.00     36.00      6.00
        0   1   4     6057.00     82.00     13.00
        0   1   5     3698.00     46.00     16.00

    The rows correspond to different reflections, and the columns correspond to the 6 columns of data described in the header. Some entries are given as "?". This represents missing data, and the total number of such entries for each column is listed in the table OVERALL FILE STATISTICS.

  14. When you have finished examining the file, click on Quit. Close all other windows except the main window.

1c) MTZ format: unmerged files

The Problem

The previous example looked at a so-called merged MTZ file. This type of file has only one record for each set of hkl indices, and is the type of file one has after merging together all different observations of a particular reflection. In the early stages of data processing, however, one has several observations of each reflection (i.e. from different images or symmetry-related) and such reflection data are held in an unmerged MTZ file. In this exercise, we examine an unmerged MTZ file.

Exercise

  1. Open the Convert to/modify/extend MTZ task window again (see above).

  2. On the first line, change the job title to:

    Job title Importing unmerged DMSO data (intro tutorial step 40)

  3. On the second line, select ascii MTZ from the pull-down menu.

  4. On the 3rd line, turn off Create full unique set of reflections using the radiobutton. This is not appropriate for unmerged data.

  5. Now enter the input file as:

    In DATA aucn.na4

    (In the File Selection Window, change the Filename filter to *.na4)

    The output file is set automatically to:

    Out TEST aucn.mtz

  6. Now look at the folder MTZ Project, Crystal & Dataset Names. Enter:

    Crystal aucn belonging to Project dmso
    Dataset name red_aucn

  7. Cell and symmetry information is obtained from the input file and doesn't need to be entered. So click on Run -> Run Now.

  8. When the job has finished, inspect the contents of the output unmerged file using View Files from Job -> aucn.mtz. Much of the information is the same as for the previous example, but there is some extra information specific to unmerged MTZ files.

  9. Unmerged MTZ files have a standard set of column labels:

          * Column Labels :
    
           H K L M/ISYM BATCH I SIGI IPR SIGIPR FRACTIONCALC XDET YDET ROT WIDTH LP MPART

    These will normally be the same for all unmerged files.

  10. Reflection records are grouped into batches: a batch corresponds to an image (or group of images) upon which a subset of the reflections were recorded. The same hkl triplet may occur several times, with different instances being distinguished by different batch numbers. A list of batches is given at the end of the default display:

        Batch number:
        5
        Batch number:
        6
        Batch number:
        7
        Batch number:
        8
        Batch number:
        9
        Batch number:
        10
  11. Click on List More Info, and this time select batch headers for multi-record MTZ before clicking Apply&Exit. In the main display window, the batch header for each batch is displayed.

     Orientation data for batch 5 oscillation data
    
       Crystal number ................... 0
       Associated dataset ID ............ 1
       Cell dimensions .................. 88.91 88.91 229.22 90.00 90.00 90.00
       Cell fix flags ................... -1 1 -1 0 0 0
       Orientation matrix U ............. 1.0000 0.0000 0.0000
           (including setting angles)     0.0000 1.0000 0.0000
                                          0.0000 0.0000 1.0000
       Reciprocal axis nearest .. c*
       Mosaicity ........................ 0.020
       Datum goniostat angles (degrees).. 0.000
       Start & stop Phi angles (degrees). 343.000 344.000
       Range of Phi angles (degrees)..... 0.000
       Start & stop time (minutes)....... 0. 0.
     Crystal goniostat information :-
       Number of goniostat axes.......... 1
       Goniostat vectors.....        .... 0.0000 0.0000 1.0000
                        .....        .... 0.0000 0.0000 0.0000
                        .....        .... 0.0000 0.0000 0.0000
     Beam information :-
       Idealized X-ray beam vector....... -1.0000 0.0000 0.0000
       X-ray beam vector with tilts...... -1.0000 0.0000 0.0000
       Wavelength and dispersion ........ 0.88000 0.00120 0.00010
       Divergence ....................... 0.120 0.020
     Detector information :-
       Number of detectors............... 0
       Crystal to Detector distance (mm). 0.000
       Detector swing angle.............. 0.000
       Pixel limits on detector.......... 0.0 0.0 0.0 0.0

    The batch header contains information on how the corresponding image was recorded, and this information is used by certain programs such as SCALA.

  12. When you have finished examining the file, click on Quit. Close all other windows except the main window.

1d) The Loggraph Utility

The Problem

Many of the CCP4 programs produce specially formatted log files which contain tables and graphs which can be recognised by program LOGGRAPH and reproduced in graphic representation. Graphs can be edited and annotated, and printed either to a PostScript file or directly to a printer.

In order to create a log file with suitable graphs for the purpose of this tutorial, we will run program BAVERAGE from CCP4i.

Exercise

  1. Select the Structure Analysis module, and open the Temperature Factor Analysis task window.

  2. On the first line, enter a suitable job title such as

    Job title Getting to grips with loggraph (intro tutorial step 100)

  3. Select a PDB input file:

    PDB in DATA toxd.pdb

    (Use the Browse button after selecting DATA from the pull-down menu)

  4. Click on Run -> Run Now.

    The program BAVERAGE will run, and the Loggraph Viewer will open automatically.

    Loggraph should update itself with the graphs from the BAVERAGE run shortly after the run completes. However if this fails then close the Loggraph window, then either rerun the BAVERAGE job, or select the run in the job list and click on View Files from Job -> View Log Graphs.

  5. In the Loggraph window, from the Tables in File panel, select Average B v residue  From baverage. CHAIN A. The first Graph in Selected Table will be displayed, namely Average Bfactors (all atoms)  Chain A. Change this to Average Bfactors (side chains)  Chain A.

    Some of the residues have Bfactors of 0.0 for the side chains. Use the cursor and the cross-hairs to determine which residues they are. Check against the contents of the PDB file why they should have such a value. To do this, in the Temperature Factor Analysis task window, click on the View button in the line where the PDB file was selected - this will display a CCP4i fileviewer window with the contents of toxd.pdb.

    To come back to Loggraph at a later date, select the baverage job from the Job List in the main window of CCP4i, then click on View Files from Job -> View Log Graphs from the menu on the right-hand side of the main window.

    To view graphs from a log file which has not been produced by CCP4i (and is hence not part of any project Job List), click on View Any File from the menu on the right-hand side of the main window. Then go to the directory which contains the log file, select File type log CCP4 log and Viewer View Log Graphs. Select the desired file, and click on Display&Exit. The Loggraph viewer will now be displayed as before.

  6. Close the Loggraph window using File -> Exit. Close or Quit all other interface windows except the main window.


On to the next tutorial - Data Processing and Reduction.

Back to the index.


Valid CSS! Valid XHTML 1.0!