Index of /cvmfsmonitor/repository/wenmr.egi.eu/BMRZ/mars/1.1.3/html

[ICO]NameLast modifiedSizeDescription

[PARENTDIR]Parent Directory   -  
[DIR]image/ 2005-03-18 07:52 -  

MARS


As described in the papers:

  1. Mars - robust automatic backbone assignment of proteins
  2. Backbone assignment of proteins with known structure using residual dipolar couplings

    Young-Sang Jung and Markus Zweckstetter, submitted.


Contact: mzwecks@gwdg.de



What is MARS ?
Download
Setup
Getting Started
Setting up input files
Setting up assignment parameters
How to run MARS
Output
Analysis of assignment / Testing
Directory structure
Important points to remember
For advanced users


What is MARS?

    MARS is a program for robust automatic backbone assignment of 13C/15N labeled proteins. MARS simultaneously optimizes the local and global quality of assignment to minimize propagation of initial assignment errors and to extract reliable assignments. It can be applied independent of the assignment complexity, it does not require tight thresholds for establishing sequential connectivity or detailed adjustment of these thresholds, it can work with a wide variety of NMR experiments and it is robust against missing chemical shift information.





    Key features

    1. simultaneous optimization of the local and global quality of assignment
    2. exhaustive search for fragment lengths comprising up to five PRs during linking and mapping
    3. best-first elements for both linking and mapping
    4. combination of the secondary structure prediction program PSIPRED (McGuffin et al., 2000) with statistical chemical shift distributions, which were corrected for neighboring residue effects (Wang and Jardetzky, 2002), to improve identification of likely positions in the primary sequence
    5. assessment of the reliability of fragment mapping by performing multiple assignment runs with noise-disturbed chemical shifts



Download

    MARS

      MacOS X
      SGI Irix 6.2 version
      Linux


      The download provides a compressed tar archive with a MARS and PALES executable and example files. The archive can be unpacked with a command like the following:
        
        zcat MARS.linux.tar.Z | tar xvf -
        zcat pales.linux.tar.Z | tar xvf -
        		


      Users are encouraged to email the author to be informed about updates and related software.

    PALES (home)

    For assignment using residual dipolar couplings the software PALES has to be installed in addition. This is only required if a 3D structure is known.


Setup

    Open your .cshrc files in your home directory. Add the three lines below to your .cshrc file.

    	setenv MARSHOME  directoryName
    	setenv PALESHOME  directoryName
    	alias runmars directoryName/runmars
    (directoryName is the name of the directory that contains the binary and script files.)
    
    An example:
    
    	setenv PALESHOME  /usr/users/yjung/bin/PALES
    	setenv MARSHOME   /usr/users/yjung/bin/MARS
    	alias runmars    '/usr/users/yjung/bin/MARS/runmars'
    	
    



Getting started

    Input


    MARS is a program for backbone assignment of 13C/15N labeled proteins. Accordingly, following input is required:


    How to run Mars

    1. Prepare your chemical shift table.
    2. Get your primary sequence in FASTA format.
    3. Get a secondary structure prediction using the Psipred web server.
    4. Adjust the parameter setup file (mars.inp).
    5. Type 'runmars mars.inp'

    Output


Setting up input files

    Obligatory


    1. A Mars run is controlled by the parameter setup file (mars.inp). This has to be adjusted to the available experimental data. Please see below for a detailed description of the parameters.

        Lines with a '#' sign as first character as well as empty line are ignored.
        Do not change the variable names such as nIter.
        	
        mars.inp (MARSHOME/example/noStructure/1ubq/input)
        				
          fragSize: 5 # Maximum length of pseudoresidue fragments cutoffCO: 0.25 # Connectivity cutoff (ppm) of CO [0.25] cutoffCA: 0.2 # Connectivity cutoff (ppm) of CA [0.5] cutoffCB: 0.5 # Connectivity cutoff (ppm) of CB [0.5] cutoffHA: 0.25 # Connectivity cutoff (ppm) of HA [0.25] fixConn: fix_con.tab # Table for fixing sequential connectivity fixAss: fix_ass.tab # Table for fixing residue type and(or) assignment pdb: 0 # 3D structure available [0/1] resolution: NO # Resolution of 3D structure [Angstrom] pdbName: NO # Name of PDB file (protons required!) tensor: NO # Method for obtaining alignment tensor [0/1/2/3/4] nIter: NO # Number of iterations [2/3/4] dObsExh: NO # Name of RDC table for exhaustive SVD (PALES format) dcTab: NO # Name of RDC table (PALES format) deuterated: 0 # Protonated proteins [0]; perdeuterated proteins [1] sequence: 1ubq_fasta.tab # Primary sequence (FASTA format) secondary: 1ubq_psipred.tab # Secondary structure (PSIPRED format) csTab: 1ubq_cs.tab # Chemical shift table
    2. chemical shift table

        The chemical shift table follows the SPARKY format. It consists of a header, pseudoresidues and chemical shifts. The header has to be defined before the listing of chemical shift values starts and includes the variable names for the chemical shifts. Currently 10 different chemical shifts are supported and should be indicated by 'CA', 'CA-1', 'CB', 'CB-1', 'CO', 'CO-1', 'HA', 'HA-1', 'H' and 'N'. These variable names have to be in the same order as the columns for the different chemical shifts. The first column has to be the pseudoresidue column and other columns are chemical shift columns. Pseudoresidue means the name of the group of peaks which share the same (or similar due to the experimental imperfection)  N and HN chemical shifts. Lines with a '#' sign as first character as well as empty lines are ignored. Missing chemical shift values have to be indicated by ' - '.

         
        1ubq_cs.tab(MARSHOME/example/noStructure/1ubq/1ubq_cs.tab)
        		
          N CO-1 H CA-1 CA PR_2 123.220 170.540 8.900 54.450 55.080 PR_3? 115.340 175.920 8.320 55.080 - PR_4? 118.110 172.450 8.610 59.570 55.210 PR_5GLY 121.000 175.320 9.300 55.210 60.620 PR_6GLY 127.520 - 8.820 60.620 54.520 PR_7 115.400 177.140 8.730 54.520 60.470 PR_8 121.330 176.910 9.100 60.470 57.580 PR_9 105.590 178.800 7.630 57.580 61.400 PR_10?? 108.890 175.520 7.810 61.400 45.460 : : :
        Pseudoresidue names can consist of characters+number+characters and characters are optional. Pseudoresidues are only distingushed by the number, which can be any integer number larger than 0. Therefore following, equivalent notations are allowed for a pseudoresidue:

          Correct format but equivalent to each other:
          
          	PR_37, g_37, 037, PR_037, 37GLY, PR_37???, ...
          

          Correct format for pseudoresidues:
          
          	37, g01, g135, K016, LLR001,g12?, 124D, ...
          

          Incorrect format for pseudoresidues:
          
          	34PR_37, g01_1, PR_34_1, g135GLY1, ...
          



    3. The primary sequence of the protein has to be in FASTA format.

        IMPORTANT: 'X' and 'Z' can not be used for the characters of a sequence.

        1ubq_fasta.tab (MARSHOME/example/noStructure/1ubq/1ubq_fasta.tab)
        				
          > ubq MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYN IQKESTLHLVLRLRGG
    4. Secondary structure prediction table

        This has to be in Psipred format. Use the Psipred web server to get the table.
        1ubq_psipred.tab (MARSHOME/example/noStructure/1ubq/1ubq_psipred.tab)
        	
          PSIPRED PREDICTION RESULTS Key Conf: Confidence (0=low, 9=high) Pred: Predicted secondary structure (H=helix, E=strand, C=coil) AA: Target sequence Conf: 968896699888999867863189999999997689875658887777738887136726 Pred: CEEEEECCCCCEEEEEECCCCCHHHHHHHHHHHHCCCHHHEEEEECCEECCCCCCHHHHC AA: MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYN 10 20 30 40 50 60 Conf: 8988889999950699 Pred: CCCCCEEEEEEECCCC AA: IQKESTLHLVLRLRGG 70

      If a 3D structure and experimental RDCs are available:

    5. PDB file

        All standard PDB files can be used (including MOLMOL files).

        IMPORTANT: When using shape-prediction all atoms in the PDB file will be used including pseudo atoms (ANI).


    6. Dipolar coupling input

      Experimental dipolar couplings are supplied according to the PALES table format:

      • The protein sequence should be given as shown by one or more "DATA SEQUENCE" lines. Space characters in the sequence will be ignored.
      • The table must include columns for residue ID, three-character residue name and the atom name for both atoms that are involved in the dipolar coupling as well as the dipolar coupling itself, its error and a weighting factor. Segment ID and Chain ID are optional.
        IMPROTANT: The atom notation must match that of the PDB file.
      • The table must include a "VARS" line that labels the corresponding columns of the table.
      • The table must include a "FORMAT" line that defines the data type of the corresponding columns of the table.
      • Lines with a '#' sign as first character as well as empty lines are ignored.


        Example dipolar coupling table (excerpts):
        DATA SEQUENCE MQIFVKTLTG KTITLEVEPS DTIENVKAKI QDKEGIPPDQ QRLIFAGKQL
        DATA SEQUENCE EDGRTLSDYN IQKESTLHLV LRLRGG
        
        VARS   RESID_I RESNAME_I ATOMNAME_I RESID_J RESNAME_J ATOMNAME_J D      DD    W
        FORMAT %5d     %6s       %6s        %5d     %6s       %6s    %9.3f   %9.3f %.2f
        
            2    GLN      N      2    GLN     HN     -15.524     1.000 1.00
            3    ILE      N      3    ILE     HN      10.521     1.000 1.00
            4    PHE      N      4    PHE     HN       9.648     1.000 1.00
            5    VAL      N      5    VAL     HN       6.082     1.000 1.00
        
            1    MET      C      2    GLN     HN       3.993     0.333 3.00
            2    GLN      C      3    ILE     HN      -5.646     0.333 3.00
            3    ILE      C      4    PHE     HN       1.041     0.333 3.00
            4    PHE      C      5    VAL     HN       0.835     0.333 3.00
        
            1    MET      C      2    GLN      N       2.651     0.125 8.00
            2    GLN      C      3    ILE      N      -3.768     0.125 8.00
            3    ILE      C      4    PHE      N       1.463     0.125 8.00
            4    PHE      C      5    VAL      N      -1.726     0.125 8.00
        
            2    GLN      N      2    GLN     HN     -15.524     1.000 1.00
            3    ILE      N      3    ILE     HN      10.521     1.000 1.00
            4    PHE      N      4    PHE     HN       9.648     1.000 1.00
            5    VAL      N      5    VAL     HN       6.082     1.000 1.00
        
            1    MET     HA      1    MET     CA     -38.341     1.000 0.50
            2    GLN     HA      2    GLN     CA      11.662     1.000 0.50
            3    ILE     HA      3    ILE     CA      18.424     1.000 0.50
            4    PHE     HA      4    PHE     CA      26.733     1.000 0.50
        
        


    Optional

    1. When additional information such as specific amino acid type labeling or initial manual assignments are available assignment of pseudoresidues can be restricted to single or to a set of residues. The first column has to be a pseudoresidue name followed by residue numbers or amino acid types to which the assignment should be restricted. Assignments can be fixed one by one by specipying the corresponding residue numbers or restrict it to a whole residue fragment by specifying the starting and ending residue number (inclusive) connected by '-' (without a blank in between the start and end number!). At the same time, amino acid types can be fixed by specifying the corresponding one letter code. More than one amino acid type can be specified by concatenation of the corresponding one letter codes (i.e. attach additional one-letter codes without blank in between).
      fix_ass.tab (MARSHOME/example/noStructure/1ubq/fix_ass.tab)
      	
        PR_3 3 PR_10 10-15 23 34 PR_12 12 34-36 PR_13 13 PR_14 14 16 HKT PR_15 LFR 66-69 13-16 9 71 PR_16 EVA
    2. Also sequential connectivities can be fixed. This is especially useful when assignment is done iteratively by Mars and manually. The first and second column are pseudoresidue names. The first column is the name of the pseudoresidue for which the intra-residual chemical shift can be connected to the inter-residual chemical shift of the pesudoresidue in the second column.
      fix_con.tab (MARSHOME/example/noStructure/1ubq/fix_con.tab)
      	
        PR_2 PR_3 PR_3 PR_4 PR_4 PR_5 PR_11 PR_12 PR_12 PR_13 PR_13 PR_14 PR_25 PR_26 PR_26 PR_27

Setting up assignment parameters

    fragSize:

      Sequential connectivity is established by matching inter- and intra-residual chemical shifts. Fragments comprising up to fragSize pseudoresidues are searched for exhaustively. The maximum segment length fragSize is a compromise between the desired total execution time of a MARS assignment run and the ability to reliably place PR segments onto the protein sequence.

      According to our tests a fragSize of 5 is large enough to get reliable assignments (pseudoresidue fragments with length five can in most cases be placed uniquely into the protein sequence when intra- and inter-residual Ca and Cb chemical shifts are available).

      For smaller proteins or if more computing power is available larger fragment sizes (six or seven) can be employed. This is expected to be useful if, for example, no Cb chemical shift information is available.

    cutoff


      Cutoff values should be determined according to the resolution of the spectra. If chemical shifts were obtained from standard HNCACB, CBCACONH and HNCO experiments reasonable values will be
        Ex.)
        	cutoffCO: 0.1
        	cutoffCA: 0.5
        	cutoffCB: 0.5
        	cutoffHA: 0.1
        	
      Note that too small error bounds will lead to a small number of reliable assignments.

    fixConn

      This is optional. If you want to fix sequential connectivities, prepare a table like fix_con.tab and specify the table name, otherwise set the fixConn parameter to NO.

      NOTE: At one iteration step MARS generates 60 assignment solutions and extracts reliable assginments from these solutions. After the first iteration step MARS automatically fixes reliable assignments and reliable sequential connectivities obatined from previous iteration steps without user intervention. The iteration is continued until the number of reliable assignments does not increase any more. Therefore, one can see fixed assignments and fixed sequential connectivities on the screen during a MARS run although the user didn't fix anything at the start of MARS.
        Ex.)
        fixConn: NO or fixConn: fix_conn.tab

    deuterated

      If a protein is perdeuterated, set the deuterated parameter to 1. Otherwise put it to 0.
        Ex.)
        deuterated: 0 or deuterated: 1

    sequence

      Specifiy the name of the file that contains the primary sequence of your protein in FASTA format.
        Ex.)
        sequence: 1ubq_fasta.tab

    secondary

      Specifiy the name of the file that contains the secondary structure information of your protein in PsiPred format.
        Ex.)
        secondary: 1ubq_psipred.tab

    csTab

      Specifiy the name of the file that contains the experimental chemical shifts (SPARKY format).
        Ex.)
        csTab: 1ubq_cs.tab



    If no 3D structure or RDCs are available,
    put the additional paramters as below:

      
      pdb: 		0   			
      resolution: 	NO			
      pdbName: 	NO			
      tensor: 	NO	   		
      nIter: 		NO  			
      
      dObsExh: 	NO			
      dcTab: 		NO			
      	


    If a 3D structure and experimental RDC are available,
    following parameters have to be set up:

    pdb

      Put the pdb flag pdb to 1, in order to use RDCs and the known 3D structure (otherwise set it to 0 ).
        Ex.)
        pdb: 1

    resolution

      Specify the resolution of your crystal structure. If you don't konw the resolution of the structure because it is a homology model, set the resolution to ~ 4.0. In this case it will be useful to perform multiple assignment runs with decreasing values for the resolution parameter (suggested range is 2.0 < resolution < 6.0). The optimum value corresponds to the assignment run where the maximum number of reliable assignments was obtained.
        Ex.)
        resolution: 1.8

    pdbName

      Name of file containing the coordinates of the 3D structure. All standard PDB files (including Molmol) can be used. IMPORTANT: Protons have to be present.
        Ex.)
        		pdbName: 1ubq.pdb
        	

    tensor

      Method for obtaining an initial estimate of the alignment tensor. Four different modes are available that can automatically be accessed by specifying 1, 2, 3 or 4. The standard mode is 3.

        If 1 is selected, MARS will use a gridSearch for estimating the orientation of the alignment tensor.
        If 2 is selected, MARS will use exhaustive back-calculation (exhSVD). (dObsExh parameter has to be setup!)
        If 3 is selected, MARS will use singular value decomposition (SVD).
        If 4 is selected, MARS will use shape-prediction (shapePred).

        Ex.)
        tensor: 3
        (It is recommended to use 1 or 3 for the tensor parameter. Modes 2 and 4 require additional knowledge or RDCs in nearly neutral alignment media.)

    nIter

      MARS refines the initial alignment tensor estimate (obtained by the tensor method specified above) several times using SVD based on the reliable assignments obtained in previous iteration steps. Here, the number of refinement steps of the alignment tensor, nIter, can be defined. According to our tests 2 refinement steps are enough.
        Ex.)
        nIter: 2

    dObsExh

      For exhaustive back-calculation (tensor mode 2) an RDC table is required that contains RDCs of a specific amino acid type. If the tensor mode is 1, 3 or 4, put the dObsExh parameter to NO.
        Ex.)
        dObsExh: NO or dObsExh: dObs_1ubq_GLY.tab

    dcTab

      Name of file that contains the experimental RDC values (in PALES format).
        Ex.)
        dcTab: dObs_1ubq.tab




Output

  1. The first column is the residue number of the protein; the second column is the pseudoresidue that the residue is assigned to. The third column indicates the degree of reliability of each assignment. Three levels of reliability are distinguished:

    H indicates high reliablity as defined in the MARS paper. M and L do not fulfill all the criteria required for H reliablity and the specific criteria employed are adjusted automatically according to the completeness of the input data. Please see below for the robustness of assignments labelled as M and L.
    
    assignment_AA.out
    		
      MET_1 GLN_2 PR_2 (M) ILE_3 PR_3 (M) PHE_4 PR_4 (H) VAL_5 PR_5 (H) LYS_6 THR_7 LEU_8 THR_9 PR_9 (M) GLY_10 PR_10 (H) LYS_11 PR_11 (H) THR_12 PR_12 (H) ILE_13 PR_13 (H) THR_14 PR_14 (H) LEU_15 PR_15 (H) GLU_16 PR_16 (M) VAL_17 GLU_18 PRO_19 SER_20 PR_20 (L) ASP_21 THR_22 : : :
  2. The first column is the residue number of the protein. Additional colums list pseudoresidues that can be assigned to this resdiue. Numbers in parenthesis are assignment probablities. Only pseudoresidues with an assignmnent probablity of higher than 10% are shown. assignment_AA.out is a subset of the assignments here.
    
    assignment_AAs.out
    		
      MET_1 GLN_2 PR_2 (96) ILE_3 PR_3 (100) PHE_4 PR_4 (100) VAL_5 PR_5 (100) LYS_6 PR_6 (63) PR_8 (30) THR_7 PR_7 (76) LEU_8 PR_8 (61) THR_9 PR_9 (100) GLY_10 PR_10 (100) LYS_11 PR_11 (100) THR_12 PR_12 (100) ILE_13 PR_13 (100) THR_14 PR_14 (100) LEU_15 PR_15 (100) GLU_16 PR_16 (100) VAL_17 PR_17 (73) GLU_18 PRO_19 SER_20 PR_20 (86) ASP_21 PR_21 (65) THR_22 PR_57 (33) : : :
  3. assignment_PR.out lists the most likely assignment for each pseudoresidue present in the input chemical shift table. The first column is the pseudoresidue and the second is the residue (to which the pseudoresidue can be assigned to most likely). NOTE: 'The most likely assignment' does not mean reliable assignment and two pseudoresidues can also be assigned to one residue. The information present in assignment_PR.out is useful if a pseudoresidue is not assigned to any residue in assignment_AAs.out and one asks himself what it might be assigned to.
    
    assignment_PR.out
    		
      PR_2 GLN_2 PR_3 ILE_3 PR_4 PHE_4 PR_5 VAL_5 PR_6 LYS_6 PR_7 THR_7 PR_8 LEU_8 PR_9 THR_9 PR_10 GLY_10 PR_11 LYS_11 PR_12 THR_12 PR_13 ILE_13 PR_14 THR_14 PR_15 LEU_15 PR_16 GLU_16 PR_17 VAL_17 PR_18 GLU_18 PR_20 GLN_40 PR_21 GLN_41 PR_22 SER_57 : : :
  4. In connectivity.out all possible sequential connectivities between pseudoresdiues are listed. All numbers are pseudoresidue numbers. The first column (closed by '-->') is the pseudoresidue number for which connectivities are listed. If no additional entries are present no connectivities could be found for that pseudoresidue. Otherwise, all pseudoresidue numbers are listed for which the inter-residual chemical shift can be matched to the intra-residual chemical shift of the pseudoresidue in the first column.
    
    connectivity.out
    		
      PR_2 --> PR_3 PR_5 PR_35 PR_43 PR_69 PR_74 PR_3 --> PR_4 PR_23 PR_30 PR_56 PR_4 --> PR_3 PR_5 PR_29 PR_35 PR_43 PR_5 --> PR_6 PR_8 PR_71 PR_6 --> PR_2 PR_7 PR_49 PR_55 PR_7 --> PR_6 PR_8 PR_8 --> PR_9 PR_21 PR_9 --> PR_10 PR_10 --> PR_11 PR_48 PR_76 PR_11 --> PR_12 PR_42 PR_75 PR_12 --> PR_13 PR_24 PR_62 PR_67 PR_13 --> PR_14 PR_32 PR_14 --> PR_15 PR_15 --> PR_16 PR_44 PR_16 --> PR_17 PR_69 PR_74 PR_17 --> PR_18 PR_60 PR_18 --> PR_16 PR_47 PR_20 --> PR_9 PR_21 PR_21 --> PR_22 PR_40 PR_41 PR_50 PR_73 PR_22 --> PR_4 PR_23 PR_30 PR_56
  5. In connectivity_reduced.out all possible sequential connectivities between pseudoresdiues are filtered for reliable assignments (i.e. it is a subset of connectivity.out).
    
    connectivity_reduced.out
    		
      PR_2 --> PR_3 PR_3 --> PR_4 PR_4 --> PR_5 PR_5 --> PR_6 PR_8 PR_71 PR_6 --> PR_2 PR_7 PR_49 PR_55 PR_7 --> PR_6 PR_8 PR_8 --> PR_9 PR_21 PR_9 --> PR_10 PR_10 --> PR_11 PR_11 --> PR_12 PR_12 --> PR_13 PR_13 --> PR_14 PR_14 --> PR_15 PR_15 --> PR_16 PR_44 PR_16 --> PR_17 PR_69 PR_74 PR_17 --> PR_18 PR_60 PR_18 --> PR_16 PR_20 --> PR_9 PR_21 PR_21 --> PR_22 PR_40 PR_41 PR_50 PR_73 PR_22 --> PR_23 PR_56
  6. Detailed information about assignment parameters, percentage of expected intra- and inter-residual Ca, Cb, Co and Ha chemical shifts present in the input chemical shift table, predicted chemical shifts, number of reliable assignments, number of constraints for each pseudoresidue, matrices matching experimental and back-calculated chemical shifts and/or RDCs and pseudoenergy matrices at each iteration step.
    
    mars.log
      ------------------------------------------------------------------------------------------ fragSize: 5 # Maximum length of pseudoresidue fragments cutoffCO: 0.25 # Connectivity cutoff (ppm) of CO [0.25] cutoffCA: 0.2 # Connectivity cutoff (ppm) of CA [0.5] cutoffCB: 0.5 # Connectivity cutoff (ppm) of CB [0.5] cutoffHA: 0.25 # Connectivity cutoff (ppm) of HA [0.25] fixConn: fix_con.tab # Table for fixing sequential connectivity fixAss: fix_ass.tab # Table for fixing residue type and(or) assignment pdb: 0 # 3D structure available [0/1] resolution: NO # Resolution of 3D structure [Angstrom] pdbName: NO # Name of PDB file (protons required!) tensor: NO # Method for obtaining alignment tensor [0/1/2/3/4] nIter: NO # Number of iterations [2/3/4] dObsExh: NO # Name of RDC table for exhaustive SVD (PALES format) dcTab: NO # Name of RDC table (PALES format) deuterated: 0 # Protonated proteins [0]; perdeuterated proteins [1] sequence: 1ubq_fasta.tab # Primary sequence (FASTA format) secondary: 1ubq_psipred.tab # Secondary structure (PSIPRED format) csTab: 1ubq_cs.tab # Chemical shift table ------------------------------------------------------------------------------------------ # of AA: 76 # of PRO: 3 # of GLY: 6 # of Assignable AA: 72 # of PR: 72 CA: 100.0 CB: 0.0 CO: 0.0 HA: 0.0 Ca: 100.0 Cb: 0.0 Co: 100.0 Ha: 0.0 AC RC RW -------------------- 53 11 0 -------------------- 55 13 0 54 13 0 -------------------- 54 20 0 53 20 0 -------------------- : : : : :



Analysis of assignment

    1. Check the connectivity.out to verify that your chemical shift table has been made properly. If there are problems in your chemical shfit table due to miscalibration of spectra and(or) many pseudoresidues grouped incorrectly, you will see many missing sequential connectivities in the connectivity.out table.

    2. MARS keeps reliablility of assignment even for highly degenerate and/or incomplete data sets and labels assignments according to three different levels of reliability, H (high reliable assignment), M (medium reliable assignment), L (low reliable assignment). The following table shows the robustness of Mars assignment for different proteins, different connectivity cutoffs and ranked according to H, M and L reliability. From this table it becomes clear that even medium reliable assignments are basically error-free and only for less stringent connectivity cutoffs assignments labelled as L contain a few errors.


        Table 1. (BMRB chemical shifts used for the test. / COcutoff: 0.25, CAcutoff: 0.5, CBcutoff: 0.5)
        Tested proteins Used chemical shfits # of AA (assignable AA) H (Correct/Incorrect) M (Correct/Incorrect) L (Correct/Incorrect)
        Maltose binding protein C', Ca, Cb
        723 (654)
        639/ 0
        9/ 0
        0/ 0
        Maltose binding protein Ca, Cb
        370 (335)
        303/ 0
        18/ 1
        0/ 0
        EIN of the phospoenolpyruvate Ca, Cb
        259 (248)
        232/ 0
        6/ 0
        0/ 0
        E-cardherin domains II Ca, Cb
        227 (167)
        76/ 1
        5/ 1
        12/ 9
        Human prion protein Ca, Cb
        210 (190)
        130/ 0
        9/ 0
        5/ 0
        Superoxide dismutase C', Ca, Cb
        192 (117)
        101/ 0
        2/ 1
        2/ 0
        Calmodulin/M13 complex C', Ca
        148 (144)
        37/ 0
        7/ 0
        17/ 2
        E. coli EmrE C', Ca, Cb
        110 (74)
        35/ 0
        6/ 0
        10/ 3
        Human ubiquitin Ca, Cb
        76 (72)
        72/ 0
        0/ 0
        0/ 0


        Table 2. (BMRB chemical shifts used for the test. / COcutoff: 0.15, CAcutoff: 0.2, CBcutoff: 0.4)
        Tested proteins Used chemical shfits # of AA (assignable AA) H (Correct/Incorrect) M (Correct/Incorrect) L (Correct/Incorrect)
        Maltose binding protein C', Ca, Cb
        723 (654)
        639/ 0
        9/ 0
        0/ 0
        Maltose binding protein Ca, Cb
        370 (335)
        324/ 0
        7/ 0
        2/ 0
        EIN of the phospoenolpyruvate Ca, Cb
        259 (248)
        246/ 0
        0/ 0
        0/ 0
        E-cardherin domains II Ca, Cb
        227 (167)
        102/ 0
        7/ 0
        19/ 0
        Human prion protein Ca, Cb
        210 (190)
        127/ 0
        4/ 0
        5/ 0
        Superoxide dismutase C', Ca, Cb
        192 (117)
        104/ 0
        0/ 0
        1/ 0
        Calmodulin/M13 complex C', Ca
        148 (144)
        142/ 0
        0/ 0
        0/ 0
        E. coli EmrE C', Ca, Cb
        110 (74)
        58/ 0
        4/ 0
        0/ 0
        Human ubiquitin Ca, Cb
        76 (72)
        72/ 0
        0/ 0
        0/ 0


        Table 3. (Raw peak lists used for the test. / CAcutoff: 0.3, CBcutoff: 0.5, HAcutoff: 0.05)
        Tested proteins Used chemical shfits # of AA (assignable AA) H (Correct/Incorrect) M (Correct/Incorrect) L (Correct/Incorrect)
        Z domain protein Ca, Cb, Ha
        71 (67)
        65/ 0
        0/ 0
        0/ 0
        Z domain protein Ca, Cb
        71 (67)
        34/ 0
        6/ 0
        2/ 0




    3. Connectivity_reduced.out filters all detected connectivities reported in connectivity.out according to high reliable assignments. This can be useful for manual refinement of a Mars assignment.






Directory structure

    MARSHOME:

    	
    	$MARSHOME/bin
    	$MARSHOME/example
    $MARSHOME/example/noStructure $MARSHOME/example/Structure
    $MARSHOME/example/noStructure/1ubq $MARSHOME/example/noStructure/1zym $MARSHOME/example/noStructure/1dmb
    $MARSHOME/example/Structure/1ubq $MARSHOME/example/Structure/1zym $MARSHOME/example/Structure/1dmb



Important points to remember

    1. Spectra calibration and proper peak grouping are the most important points.


    2. When grouping inter- and intra-chemical shifts try to use the same spectrum for extraction of inter- and intra-chemical shfits of a given atom type.

      For example, when you want to get intra- and interresidual chemical shifts of Ca, extract both chemical shifts from the HNCA spectrum. Only take the interresidual Ca chemical shift from a one way connectivity spectrum like HN(Co)CA, if the interresidual peak in the HNCA is too weak or overlapping.

      In that case, bigger connectivity cutoffs (cutoffCO, cutoffCA, cutoffCB, and cutoffHA) have to be used due to imperfections in spectrum calibration.

      Nevertheless, Mars does not care where you got the intra- and inter-chemical shifts from!


    3. Be careful of folded peaks!



For advanced users


    Tensor

      Grid search

        1116 alignment tensor orientations are systematically sampled and for each orientation the deviation D(i,j) between experimental and back-calculated RDCs is determined (equation 1).These 1116 orientations are obtained in a two-step procedure. First, the z-axis of the molecule samples uniformly 122 points on a unit sphere that were determined by a double cubic lattice method (Eisenhaber et al., 1995). Only one quarter of these points have to be taken into account due to the inversion symmetry of RDCs. In a second step, the molecule is rotated around the z-axis in steps of 10? giving a highly uniform and efficient sampling of all possible tensor orientations. All sampled orientations are ranked according to their corresponding D(i,j) values (equation 1) and the lowest D(i,j) value indicates the best estimate for the experimental alignment tensor. Thus, both chemical shifts and RDCs are used, making the extraction of the tensor orientation more robust.


      Exhaustive SVD

        In this method resonances belonging to a specific amino acid type are identified and all possible assignments are searched exhaustively for best-fit of back-calculated to experimental RDCs. The permutation that shows the best agreement identifies the experimental alignment tensor. exhSVD can be applied as long as at least one amino acid type is less than eleven times present in the primary sequence of the protein.


      SVD

        When the assignment is known, an alignment tensor can be obtained by best-fitting experimental dipolar couplings to a 3D structure using singular value decomposition (SVD) (Losonczi et al., 1999). Thus, a two-stage strategy for structure-enhanced assignment can be devised. In the first stage, RDCs are not used and partial assignments are obtained using only chemical shift matching or a combination of chemical shift matching and sequential connectivity information, i.e. using a common backbone assignment strategy. This assignment is assumed to be correct and a alignment tensor is obtained from best-fitting experimental RDCs to the 3D structure. The accuracy of the alignment tensor will depend on the percentage of correct assignments that were obtained in the first assignment phase. Tests show that, even when the percentage of correct assignment is below 50%, the alignment tensor is very close to its correct orientation. This is due to the fact that wrong assignments are not randomly distributed on the primary sequence of the protein. As Ca/Cb chemical shifts depend very much on the type of secondary structure, exchange of assignments mainly takes place between residues located on the same type of regular secondary structure. In addition, b-strands are often part of sheet structures, i.e. different strands are quite collinear. Therefore, residues located in these strands have similar RDCs and back-calculated alignment tensors are not severely affected by an assignment where pseudoresidues are interchanged between two b-strands (data not shown). The applicability of this approach depends mainly on the amount and type of chemical shift information available, i.e. how good is the assignment without RDCs.


      Shape prediction

        Second, molecular alignment tensors of proteins dissolved in nearly neutral bicelles can be predicted from the three-dimensional shape of the protein (Zweckstetter and Bax, 2000). This is particularly attractive as the only information necessary is the 3D structure, i.e. RDCs can be used in the same way as chemical shifts for structure-enhanced assignment. A disadvantage is that this method requires measurement of dipolar couplings in a nearly neutral alignment medium restricting the general applicability of RDC-enhanced assignment. However, recent results indicate that prediction of alignment tensors from 3D structures can be extended to a wide variety of alignment media, when electrostatic interactions are taken into account (Zweckstetter et al., 2003). Thus, this promises to be a valuable approach for obtaining approximate alignment tensors before the start of RDC-enhanced assignment.




What is MARS ?
Download
Setup
Getting Started
Setting up input files
Setting up assignment parameters
How to run MARS
Output
Analysis of assignment / Testing
Directory structure
Important points to remember
For advanced users