CCP4 Interface: Data Reduction Module

	CCP4i: Graphical User Interface
	Data Reduction Module

Background information is available on: MTZ Files

Tasks in this module:

Data Processing using Mosflm: Run iMosflm; Run Mosflm in batch

Import Integrated Data: Import Unmerged Data - Pointless; Import Unmerged Data - Combat; Import Merged Data

Find or Match Laue Group - Pointless

Scale and Merge Intensities - Scala: Scala - Task Window Layout; Scala - Datasets and Output Files

Utilities

Convert Intensities to Structure Factors - Truncate

Truncate - Task Window Layout

Treat Twinned Data - Detwin

Sort/Modify/Combine MTZ Files - Sortmtz/Rebatch/Reindex

Automated Data Processing: XIA2 automated data processing

Check Data Quality: Analysis with ctruncate; Analysis with sfcheck

Specialist Help is available on:: Reindexing; Twinning; FreerUnique

The layout of each task window, i.e. the number of folders present, and whether these folders are open or closed by default, depends on the choices made in the Protocol folder of the task (see Introduction). Although certain folders are closed by default, there are specific reasons why you should or may want to look at them. These reasons are described in the Task Window Layout sections below.

MTZ Files

Data which has not been scaled and merged is stored in a 'multi-record' MTZ file. The multi-record file contains several columns which do not appear in a standard MTZ file, for example M/ISYM and BATCH. The Scala program does the scaling and merging of data and outputs a standard MTZ file. Data imported from Mosflm is in a 'multi-record' MTZ format but any other imported unscaled data will need to be converted to the MTZ multi-record format. Imported scaled and merged data is converted directly to standard MTZ format.

Data columns within the standard MTZ file are labelled with a 'project', 'crystal' and 'dataset' names. These names should be used to distinguish data from native or derivative structures within the MTZ file. When importing data or converting to the standard MTZ format you will be required to provide a project/crystal/dataset names for the data. You are strongly recommended to use this facility carefully, both to help you keep your data organised but also because some 'down-stream' programs such as SCALEIT (Scale Datasets in the Experimental Phasing module) require that input MTZ files have consistent dataset naming.

Start iMosflm - Interactive Interface to Integrate Images using MOSFLM

Users should initially process their experimental diffraction images using MOSFLM interactively, using the iMosflm interface. Subsequently the program can be run in "batch" mode using the appropriate CCP4i task interface below.

See the iMOSFLM website for documentation.

Integrate Images in batch with MOSFLM

Once the initial parameters have been determined interactively, MOSFLM can write out a command (.sav) file which can be used to run the rest of the image integration procedure non-interactively (i.e. 'in batch'). This has the advantage that it can be much quicker than using the MOSFLM interface.

The CCP4i Integrate Images interface is populated by reading in the .sav file produced by using the interactive iMOSFLM interface, via the Take parameters from command file button. It may also be necessary to specify the location of the image files, and the name of the MOSFLM matrix (.mat) file.

Note that it is not possible (or desirable) to set up the Integrate Images task interface completely by hand.

The output of this task is an MTZ file which can be fed into the Scale and Merge Intensities task.

See program documentation: MOSFLM

Importing Unmerged Data using Pointless or Combat

The most common imported unmerged data is probably from Mosflm which is already in the multi-record MTZ format and so can be used directly in the Scale and Merge Intensities task (see also the Sort/Modify/Combine MTZ Files task for possible intermediate steps). But other formats, such as from Denzo and XDS, can be imported through the Import Unmerged Data task and then used in Scale and Merge Intensities.

The Crystal, Project and Dataset names should be set, as for the Import Merged Data task described below.

It is also possible to 'import' standard MTZ format files for conversion to the multi-record format.

The "XDS ASCII from Correct" option is designed for unscaled and unmerged data from the "Correct" stage of XDS. However, it will also work with scaled but unmerged data from XSCALE, with the proviso that oscillation range information is added to the header of the XDS file first.

See program documentation: Pointless, Combat.

Import Merged Data from Denzo and d*TREK

Data from Denzo/Scalepack or d*TREK are usually already scaled and merged. In this case, the files must be converted to standard MTZ format, structure factors generated from the intensities (see below), and the data passed through the 'uniqueify' process (see below). All these steps can be performed from the Import Merged Data task interface.

When importing data, and creating a new MTZ file, it is important to set sensible names for the Crystal, Project and Dataset. These will be used to set a data structure for reflection information, and also to control Data Harvesting (see the Data Harvesting page for background details). The crystal name should identify the physical crystal used, and the dataset should identify a dataset taken from a physical crystal (e.g. a MAD experiment may take 3 datasets from a crystal at different wavelengths). The project may correspond to the CCP4i project, but does not have to. The Import Merged Data task contains a folder to set these names. If Truncate is to be run, then this folder will also include options for data harvesting.

See program documentation: SCALEPACK2MTZ, dTREK2MTZ.

Find or Match Laue Group - Pointless

The pointless program will attempt to determine the possible Laue group of reflection data by analysing a test dataset. The input reflection data must be unmerged data.

Optionally the data can be reindexed into the "best" spacegroup and output to a second MTZ file.

See program documentation: Pointless

Scale and Merge Intensities - Scala

The Interface for Scala is quite large. Many of the options are only needed if detailed optimisation of the scaling is required. For this, the program documentation of Scala gives numerous hints, which will be incorporated in the Task Window Layout section below.

By default, data which has been scaled and merged by Scala is then converted from intensities to structure factures (see below) and usually passed through the Uniqueify process (see below). The data is then in a standard MTZ format and suitable for input to molecular replacement or experimental phasing.

It is possible to produce output reflection files in Scalepack format, which are then suitable for input into (for example) the SHELX or SOLVE packages.

Scala is one of the Data Harvesting programs. See Data Harvesting in CCP4i.

Scala - Task Window Layout

Features to look out for in the Scala Task are:

Folder title	Importance	Comment
Protocol	Customise Scala Process	Select this to access the option for outputting reflections in Scalepack format (suitable for input into SHELX or SOLVE)
	Run Truncate	Request this to get structure factor amplitudes output in addition to the scaled intensities
	Output a single MTZ file	Only available when running Truncate. If there are several output datasets then by default there will be one output file for each dataset (see below). This option runs CAD to collect all the data into a single output file
	Ensure unique data & add Free R column	Runs the Uniqueify procedure - if there are multiple output files then the Free R assigned to the first file will be copied to the rest
Convert to SFs & Wilson Plot	Estimated number of residues in the asymmetric unit	program will not work without some description of cell contents
Convert to SFs & Wilson Plot	Use [...] as identifier to append to column labels	Only available when running Truncate. By default the output dataset name will be appended to the MTZ column labels output by Truncate. Alternatively the user can choose to set their own identifiers for each dataset
Define Output Datasets		Lists the dataset definitions which are passed to the output file. The exact contents of this folder depend on the dataset information contained in the input file, and the particular mode that Scala is being run in - see below
Define Output Datasets	Project name and dataset name	facilitate data harvesting at every stage

See program documentation: Scala

Scala - Datasets and Output Files

Scala deals with batch MTZ files based on the dataset information which is contained in the input file. In the default mode dataset information present in the input file is automatically carried through to the output file. If dataset information is absent then it is necessary to define it before running the task.

If the Scala job is split into several runs ("multi-runs") then it is possible to (re)assign the output of each run to an output dataset. In this case several output datasets can be defined and these are not dependent on the input datasets.

Scala will produce separate output MTZ files for each output dataset, however these can also be merged automatically into a single file after running Truncate on each to generate structure amplitudes. The table below summarises the output files from the Scala task based on the protocol used in running the task.

Number of Input Datasets	Multi-runs	Merge after Truncate	Output
None/One	No	N/A	Single MTZ file
None/One	Yes	No	One MTZ file for each output dataset
None/One	Yes	Yes	Single MTZ file containing all output datasets
Two or more	No	No	One MTZ file for each input dataset
Two or more	No	Yes	Single MTZ file containing all output datasets
Two or more	Yes	No	One MTZ file for each output dataset
Two or more	Yes	Yes	Single MTZ file containing all output datasets

Convert Intensities to Structure Factors - Truncate

The program Truncate is used to obtain structure factor amplitudes from intensities. This conversion is performed by default when importing merged data or running Scala (Scale and Merge Intensities). There is an explicit interface to the Truncate program which includes some less commonly used options. By default, if you use this task interface then the data will also be passed through the Uniqueify process (see below).

Truncate is one of the Data Harvesting programs. See Data Harvesting in CCP4i.

See program documentation: Truncate, Ctruncate

Truncate - Task Window Layout

Features to look out for in the Truncate Task are:

Folder title	Importance	Comment
Required Parameters	Define unit cell contents	program will not work without some description of cell contents
Log File Output	Draw Wilson plot to file	default throughInterface YES, compare PLOT OFF
Data Harvesting	Create harvest file and assign project and dataset names	facilitate data harvesting at every stage; default for harvesting may be re-set through Preferences button in the menu on the right-hand side of the Main Window
Infrequently Used Parameters	Use French&Wilson truncate protocol to generate SFs	default YES, see description

See program documentation: Truncate

Treat Twinned Data - Detwin

The detwin program will either analyse the data to determine the twinning fraction or generate detwinned data.

See program documentation: Detwin

See also the teaching document on: Twinning.

Sort/Modify/Combine MTZ Files

There are some editing functions which you may need to apply to multi-record MTZ files prior to scaling - particularly changing the space group or reindexing might be necessary if the initial reflection indexing is suspect. There is also an option to reset the batch numbers for sets of reflections. This should only be necessary if they were wrongly or not recorded at the time of data collection or if you suspect that some sets of reflections need special treatment by Scale and Merge Intensities and need identifying as a separate batch.

This task will also combine several multi-record MTZ files into one multi-record MTZ file for input to the Scale and Merge Intensities task. The output file will be sorted as required.

Note that there is an option in the Reflection Data Utilities to reindex standard MTZ files.

See program documentation: Sortmtz, REINDEX, Rebatch.

See also the teaching document on: Reindexing.

Automated Data Processing with XIA2

xia2 is an automated data reduction system which works from raw diffraction data plus minimal information about the diffraction experiment. Given a set of images it will perform data processing and scaling with minimal user intervention.

See program documentation: Xia2

Check Data Quality with Ctruncate

After performing data reduction it is a good idea to check the quality of the data before proceeding with the next steps in the structure solution process. This interface provides easy access to the data analysis functions in the Ctruncate program.

See program documentation: Ctruncate

Check Data Quality with Sfcheck

See program documentation: Sfcheck

See also: Convert to MTZ & Standardise and the teaching document FreerUnique.

Valid XHTML 1.0! Valid CSS!