![]() |
CCP4i: Graphical User Interface |
Data Harvesting |
![]() ![]() ![]() |
MTZ Project Names and Dataset Names in CCP4i
From CCP4 release 4.0 onwards, dataset information is used to control the writing of structure deposition files (also known as harvesting files). Data harvesting aims to help simplify submission of structures to the database, see Data Harvesting. The CCP4 programs affected are MOSFLM, SCALA, TRUNCATE, MLPHARE, REFMAC and RESTRAIN. Dataset information can either be supplied explicitly to the program, or is taken from the MTZ file. The following data preparation programs can be used to add or adjust Project and Dataset names in the MTZ file: SCALEPACK2MTZ, DTREK2MTZ, DTREK2SCALA, Combat, F2MTZ and CAD.
Provided a Project Name and a Dataset Name are specified (either explicitly or from the MTZ file) and provided the NOHARVEST keyword is not given, the harvesting programs will automatically produce a deposition file. This file will be written to
$HARVESTHOME/DepositFiles/<projectname>/<datasetname>.<programname>The environment variable $HARVESTHOME defaults to the user's home directory, but could be changed, for example, to a group project directory. $HARVESTHOME may also be reset by CCP4i, see below.
The extra keywords associated with harvesting that are included in the data harvesting programs are:
The CCP4 Interface does not do any additional data harvesting; all this activity is in the data harvesting programs. However, CCP4i contains options for controlling and customising data harvesting, and these are described below.
The Project Name and Dataset Name used in data harvesting are part of a wider organisation of reflection data, see general description of the MTZ file format. In brief, columns in MTZ files are grouped according to datasets, which in turn are grouped according to the crystal from which they were taken. Crystals belong to particular projects. This formal structure is thus used in two distinct ways:
Depending on the context, CCP4i may refer to Project and Dataset Names only, or may refer to Project, Crystal and Dataset Names.
There are three tasks in the CCP4 Interface which import reflection data into MTZ format:
In all of these tasks you are required to enter Crystal, Project and Dataset names. Your choices are important, as these names will define the data structure held in the MTZ file. If you intend to merge the MTZ file with another, then consider what names are already being used in the other file. For example, should the new datasets belong to an existing crystal?
By default the project name will be the same as the CCP4i project name though this is not essential. This is different from the way the CCP4 programs themselves handle defaults for the project, crystal and dataset names (see, for instance, Scalepack2mtz NAME keyword).
Data harvesting is implemented for five different stages in the structure solution process and might be performed for the following tasks:
Program function | Interface Task(s) |
---|---|
Integrating images (Mosflm) | Integrate Images |
Scaling data (Scala) | Scale Experimental Intensities |
Converting intensities to SFs (Truncate) | |
Final round of refinement of heavy atom positions (Mlphare) | Run Mlphare |
Final round of refinement (Refmac) | Run Refmac |
For each task which might involve harvesting, there is a folder in the task window, usually immediately below the Files folder, where you can change the destination for harvest files and change the Project or Dataset Names. For the Mosflm task, it is necessary to define Project, Crystal and Dataset Names for the newly-created MTZ file, whether or not a harvesting file is to be produced. For the other tasks, it is possible to enter an alternative Project or Dataset name to override the ones obtained from the MTZ file.
In addition to the dataset information, the task interfaces give four options for controlling output of harvest files:
CCP4i allows for the automatic setting of certain preferences for data harvesting, via the CCP4i Preferences window (accessed from the menu on the right-hand side of the Main Window). The default option for controlling output of harvest files (see above) can be set. The harvest directories can be made private, corresponding to the program keyword PRIVATE, and the maximum width of an output row can be set, corresponding to the program keyword RSIZE (see above).
![]() ![]() ![]() |