DETWIN (CCP4: Supported Program)

NAME

detwin - tests for merohedral twinning, and detwins data

SYNOPSIS

detwin HKLIN foo_in.mtz HKLOUT foo_out.mtz
[Keyworded input]

DESCRIPTION

Twinned data is measured when two or more copies of the reciprocal lattice overlap. Hence we need to deconvolute the two twinned components to obtain useable data. The equations for two overlapping reflections, Itrue(h1) and Itrue(h2), are:

     ITw(h1) = (1-tf)*Itrue(h1)  +     tf*Itrue(h2)
     ITw(h2) =     tf*Itrue(h1)  + (1-tf)*Itrue(h2)

     where tf is the twin fraction, often denoted as 'alpha'


     thus Itrue(h1) = ((1-tf)*iTw(h1) -tf*iTw(h2)) / (1-2tf)
     thus Itrue(h2) = ((1-tf)*iTw(h2) -tf*iTw(h1)) / (1-2tf)
     also
     var(h1) = ((1-tf)/(1-2tf))**2 * sdTw(h1)**2 + (tf/(1-2tf))**2 * sdTw(h2)**2
     var(h2) = ((1-tf)/(1-2tf))**2 * sdTw(h2)**2 + (tf/(1-2tf))**2 * sdTw(h1)**2

This deconvolution is only possible if tf is not exactly 0.5. As it approaches this value the variances become extremely large.

The occurrence of twinning can often be recognised from the intensity statistics of the data set. It is important to first check whether the crystal has a pseudo translation relating two or more molecules in the asymmetric unit. Such a translation will result in some reflection classes being very weak. This program carries out several tests for twinning as a function of both twinning fraction, and resolution. It reads either intensities or amplitudes from an MTZ file, It can also detwin merohedrally twinned data for a given twinning fraction and write either corrected intensities or amplitudes for a given twinning fraction.

The twin operator is required (see TABLE OF POSSIBILITIES), and for MTZ output, the chosen twin fraction. The CCP4 GUI suggests possible twinning operators derived from the space group, or it can be entered by the user.

The twinning tests performed are:

The partial twin test (see reference [4]) <H> = <|Itw1 -Itw2|/(Itw1 +Itw2)> plotted against theoretical expectations.

The estimate of the twinning factor as (0.5 -<H>), plotted against resolution.

Those tabulated against the twinning factor, for values ranging from 0.00 to 0.48, are
(N.B. it is impossible to evaluate Itrue if the twinning-factor is exactly 0.5):

  1. Number of negative intensities after detwinning (Britton plot) This is plotted for all data, and for the subset where Itw is greater than 3 Sigma.
    If the twinning fraction is too small there should be no negative intensities generated; if it is too large, the number will increase linearly. Fitting a straight line through the plot gives a good estimate of the twinning fraction.

  2. The second moment of Z (which is also the fourth moment of E) as a function of resolution and twinning fraction. Here, Z is defined as I' / <I'> where I' = I / epsilon, and epsilon is the symmetry factor. For untwinned acentric data this should equal 2, and for a perfect twin it should be 1.5.

  3. The correlation between I1 and I2 before and after detwinning. The overall correlation is dominated by the low resolution data and is therefore not very sensitive, and can be misleading. However for high resolution data it should drop to 0.00 UNLESS there is an NCS operator related to the twinning operator.

If the output MTZ file contains IMEAN, it can be run through TRUNCATE again, and the other moments and the N(z) test examined to see if the intensity statistics now follow the expected distribution for non-twinned data. twinned. See Cumulative distribution plot, which for twinned data becomes sigmoidal, and the moments of E (or Z) which are different for twinned data than for untwinned.

The general formulae for expected moments <Z^k> for untwinned acentric data are (Z defined above):

  k-th moment of Z = Gamma(k+1)  = k!                    if k is an integer
  k-th moment of Z = sqrt(PI) * k * (k-1) * ... * 0.5    if k equals integer+0.5

  k-th moment of E = (k/2)-th moment of Z

See also the truncate documentation. Table of moments:
                    Acentric                        Centric
              Untwinned data  Perfect twin.   Untwinned data  Perfect twin.
  <E>             0.886         0.94               0.798       0.886 
  <E^3>           1.329         1.175              1.596       1.329
  <Z^2>           2.0           1.5                3.0         2.0
  <Z^3>           6.0           3.0               15.0         6.0 
  <Z^4>          24.0           7.5              105.0        24.0 

INPUT AND OUTPUT FILES

The following input and output files are used by the program:

Input Files:

HKLIN
Input MTZ file.
This would normally be the output of SCALA or SCALEPACK2MTZ containing the measured intensities, but structure factor amplitudes can also be used as input. See the LABIN keyword for details.

Output Files:

HKLOUT
Output MTZ file.
A file containing corrected Is or Fs for a single twin fraction, which must be specified with the TWIN_FRACTION keyword. If the file contains IMEAN, it can be run through TRUNCATE, which tests whether the data has been successfully detwinned. If no TWIN_FRACTION keyword is given, then no output file is written. The column labels will be extended to include the string"_detw".

KEYWORDED INPUT

The various data control lines are identified by keywords. Only the first 4 letters of each keyword are necessary. The possible keywords are:
DEBUG, PLOT,LABIN, RESOLUTION, OPERATOR, SIGMACUT,TITLE, TWIN_FRACTION

TITLE <title>

[OPTIONAL INPUT]

Title to write to output reflection file.

Default is to keep the title on the input MTZ file. If there is no title on the input MTZ file, then the title is set to: "From Detwin on the <date>"

SIGMACUT <Nsig>

[OPTIONAL INPUT]

The various tests are carried out with all reflections, and repeated for those reflections which are greater than <Nsig>*I. Tests such as the Britton plot or those using <H> are unreliable for weak data.

RESOLUTION <Dmin> <Dmax>

[OPTIONAL INPUT]

Resolution limits - either 4(sin theta/lambda)**2 or d in Angstroms (either order). Reflections outside these limits will be excluded from all analysis and omitted on output. Defaults are taken from the range of data in the input file (i.e. all data included).

OPERATOR <string>

Twinning operator, given as the indices of the reflection that is related to the reflection (h,k,l) by the twin operator. See TABLE for likely operators for each space group. If there is only one possibility for a spacegroup this will be used, and there is no need to input an operator. Otherwise it is COMPULSORY.

For example for P31 either OPERATOR -h,-k, l
or OPERATOR k, h,-l
or OPERATOR -k,-h,-l

PLOT <string>

If string is set to MURR, a Murray-Rust plot of |Fo1|/|Fo2| is generated. There is no debug information possible. (See reference [5].)

TWIN_FRACTION <alpha>

If this is given, an output MTZ file will be written to HKLOUT with Is or Fs corrected assuming the twin fraction <alpha>. It is difficult to estimate the twinfrac accurately unless the data is of very good quality, but look at all the information plotted. To carry out the exhaustive TRUNCATE tests you will need to run the program several times and look at the resulting intensity statistics. The value which gives the best fit to the theoretical distribution for the acentric terms should be used. Alternatively, run the Uppsala program "dataman" (keyword GEminin) which will estimate the twin fraction.

Once a refined model has been obtained, it is possible to use the Fcalc values from this model to obtain a better estimate of the twin fraction (not yet implemented).

LABIN <program label>=<file label>...

Specify input column labels. [OPTIONAL INPUT]

Truncate takes output from SCALA and SCALEPACK2MTZ which generate standard labels. This is the most common usage of the program, in which case LABIN records are not required.

The program labels defined are: F SIGF IMEAN SIGIMEAN F(+) SIGF(+) F(-) SIGF(-) I(+) SIGI(+) I(-) SIGI(-)

NB: Only F/SIGF or IMEAN/SIGIMEAN or F(+) F(-) SIGF(+) SIGF(-) or I(+) I(-) SIGI(+) SIGI(-) can be assigned. If F(+-) or I(+-) are assigned the anomalous pairs are detwinned.
F
A Structure Amplitude. The structure intensity used for detwinning is generated on input.
SIGF
Standard deviation of the above
IMEAN
Original average Structure Intensity
SIGIMEAN
Standard deviation of the above
I(+)
Structure Intensity of hkl
SIGI(+)
Standard deviation of the above
I(-)
Structure Intensity of -h -k -l
SIGI(-)
Standard deviation of the above

DEBUG <ndebug>

[OPTIONAL INPUT]

Debug output will be printed for <ndebug> reflections

SEE ALSO

truncate

REFERENCES

  1. Rees, D.C. Acta Cryst. (1980) A36, 578-581. The influence of Twinning by Merohedry on Intensity Statistics.
  2. Redinbo, M.R. and Yeates, T.O. (1993) Acta Cryst. D49, 375-380. Structure Determination of Plastocyanin from a Specimen with a Hemihedral Twinning Fraction of One-Half.
  3. Gomis-Ruth, F.X., Fita, I., Kiefersauer, R., Huber, R., Aviles, F.X. and Navaza, J. (1995) Acta Cryst D51, 819-823. Determination of Hemihedral Twinning and Initial Structural Analysis of Crystals of the Procarboxypeptidase A Ternary Complex.
  4. Yeates, T.O. (1997) Methods in Enzymology 276, 344-358. Detecting and Overcoming Crystal Twinning.
  5. Murray-Rust, P. (1973) Acta Cryst B29, 2559-2566.

AUTHOR

A.G.W. Leslie
andrew@mrc-lmb.cam.ac.uk
Modifications by E.J.Dodson (York)

EXAMPLES

#!/bin/csh -fv
detwin hklin /ss5/hotaylor/andrew/h1_scala.mtz  \
         hklout $scr0/detwin.mtz  << eof-detwin
title DETWIN WITH TWIN FRAC 0.4
SYM -k,-h,-l
twin 0.4
LABI IMEAN=I SIGIMEAN=SIGI
eof-detwin

# Run TRUNCATE to check all moments and N(z) statistics
trunc:
truncate hklin $scr0/detwin.mtz  hklout $scr0/trunc_detwin.mtz  <

#!/bin/csh -f
#
detwin HKLIN  s100_tetpt.mtz   \
HKLOUT $SCRATCH/s100_tetpt-detw.mtz   \
<< eof
TITLE Sp100 test R3
RESO 10 3.5
#SYMM k,h,-l         #  No need for detwinning operator this is the only possibility for R3
TWIN 0.19            #  the output file will have Idetw1 and Idetw2
SIG  6               #  only analyse data where Ih1 and Ih2 are greater than 6Sigma
LABI F=F SIGF=SIGF
END
eof