CCP4 Interface: Data Harvesting

	CCP4i: Graphical User Interface
	Data Harvesting

Introduction

MTZ Project Names and Dataset Names in CCP4i

Creating New MTZs in CCP4i

Data Harvesting in CCP4i: Preferences

Introduction

From CCP4 release 4.0 onwards, dataset information is used to control the writing of structure deposition files (also known as harvesting files). Data harvesting aims to help simplify submission of structures to the database, see Data Harvesting. The CCP4 programs affected are MOSFLM, SCALA, TRUNCATE, MLPHARE, REFMAC and RESTRAIN. Dataset information can either be supplied explicitly to the program, or is taken from the MTZ file. The following data preparation programs can be used to add or adjust Project and Dataset names in the MTZ file: SCALEPACK2MTZ, DTREK2MTZ, DTREK2SCALA, Combat, F2MTZ and CAD.

Provided a Project Name and a Dataset Name are specified (either explicitly or from the MTZ file) and provided the NOHARVEST keyword is not given, the harvesting programs will automatically produce a deposition file. This file will be written to

$HARVESTHOME/DepositFiles/<projectname>/<datasetname>.<programname>

The environment variable $HARVESTHOME defaults to the user's home directory, but could be changed, for example, to a group project directory. $HARVESTHOME may also be reset by CCP4i, see below.

The extra keywords associated with harvesting that are included in the data harvesting programs are:

PNAME: Project Name. In most cases, this will be inherited from the MTZ file.
DNAME: Dataset Name. In most cases, this will be inherited from the MTZ file.
PRIVATE: Set the directory permissions to '700', i.e. read/write/execute for the user only (otherwise '755').
USECWD: Write the deposit file to the current directory, rather than a subdirectory of $HARVESTHOME.
RSIZE: Maximum width of a row in the deposit file (default 80).
NOHARVEST: Do not write out a deposit file; default is to do so provided Project and Dataset names are available.

The CCP4 Interface does not do any additional data harvesting; all this activity is in the data harvesting programs. However, CCP4i contains options for controlling and customising data harvesting, and these are described below.

MTZ Project Names and Dataset Names in CCP4i

The Project Name and Dataset Name used in data harvesting are part of a wider organisation of reflection data, see general description of the MTZ file format. In brief, columns in MTZ files are grouped according to datasets, which in turn are grouped according to the crystal from which they were taken. Crystals belong to particular projects. This formal structure is thus used in two distinct ways:

To identify the origin of the various columns of data in the MTZ file, for example MAD data where columns for each separate wavelength can be grouped together as one dataset. This is very necessary if certain manipulations, such as scaling, are to be performed consistently on the data.
To carry the default project and dataset names which will be used as identifiers for the automated data harvesting.

Depending on the context, CCP4i may refer to Project and Dataset Names only, or may refer to Project, Crystal and Dataset Names.

Creating New MTZs in CCP4i

There are three tasks in the CCP4 Interface which import reflection data into MTZ format:

Import Scaled Data (running Scalepack2mtz or Dtrek2mtz program)
Import Unscaled Data (running Combat program)
Convert to/modify/extend MTZ (running f2mtz, cif2mtz or na4tomtz program)

In addition, in the 'Edit MTZ Datasets' task (program CAD), the Project, Crystal and Dataset names can be adjusted.

In all of these tasks you are required to enter Crystal, Project and Dataset names. Your choices are important, as these names will define the data structure held in the MTZ file. If you intend to merge the MTZ file with another, then consider what names are already being used in the other file. For example, should the new datasets belong to an existing crystal?

By default the project name will be the same as the CCP4i project name though this is not essential. This is different from the way the CCP4 programs themselves handle defaults for the project, crystal and dataset names (see, for instance, Scalepack2mtz NAME keyword).

Data Harvesting in CCP4i

Data harvesting is implemented for five different stages in the structure solution process and might be performed for the following tasks:

Program function Interface Task(s)

Integrating images (Mosflm) Integrate Images

Scaling data (Scala) Scale Experimental Intensities

Converting intensities to SFs (Truncate)
Import Scaled Denzo Data (optional)

Scale Experimental Intensities (optional)

Convert Intensities to SFs

Convert to/modify/extend MTZ (optional for some formats)

Final round of refinement of heavy atom positions (Mlphare) Run Mlphare

Final round of refinement (Refmac) Run Refmac

Program function	Interface Task(s)
Integrating images (Mosflm)	Integrate Images
Scaling data (Scala)	Scale Experimental Intensities
Converting intensities to SFs (Truncate)	Import Scaled Denzo Data (optional) Scale Experimental Intensities (optional) Convert Intensities to SFs Convert to/modify/extend MTZ (optional for some formats)
Final round of refinement of heavy atom positions (Mlphare)	Run Mlphare
Final round of refinement (Refmac)	Run Refmac

For each task which might involve harvesting, there is a folder in the task window, usually immediately below the Files folder, where you can change the destination for harvest files and change the Project or Dataset Names. For the Mosflm task, it is necessary to define Project, Crystal and Dataset Names for the newly-created MTZ file, whether or not a harvesting file is to be produced. For the other tasks, it is possible to enter an alternative Project or Dataset name to override the ones obtained from the MTZ file.

In addition to the dataset information, the task interfaces give four options for controlling output of harvest files:

Create harvest file in project harvesting directory: The harvest file is placed in a subdirectory "DepositFiles" of the project directory corresponding to the currently selected CCP4i project. This option allows the user to organise the harvesting files according to CCP4i projects. This works by CCP4i temporarily re-setting the environment variable HARVESTHOME.
Create harvest file in central harvesting directory: The harvest file is placed in a subdirectory "DepositFiles" of the directory defined by the environment variable HARVESTHOME. This may be defined for your installation in the ccp4.setup file. If HARVESTHOME is not set in the environment, then HOME is used. This option acts as a central location for a user or group.
Use current working directory: This corresponds to the program keyword USECWD (see above), where in the context of CCP4i the current working directory is the project directory (note: not the "DepositFiles" subdirectory thereof).
Do not create harvest file: Do not create any harvest files. This corresponds to the program keyword NOHARVEST (see above).

Preferences

CCP4i allows for the automatic setting of certain preferences for data harvesting, via the CCP4i Preferences window (accessed from the menu on the right-hand side of the Main Window). The default option for controlling output of harvest files (see above) can be set. The harvest directories can be made private, corresponding to the program keyword PRIVATE, and the maximum width of an output row can be set, corresponding to the program keyword RSIZE (see above).