Showing tool doc from version 4.0.11.0 | The latest version is
4.0.11.0

ScatterIntervalsByNs (Picard)

Writes an interval list created by splitting a reference at Ns.A Program for breaking up a reference into intervals of alternating regions of N and ACGT bases.


Used for creating a broken-up interval list that can be used for scattering a variant-calling pipeline in a way that will not cause problems at the edges of the intervals. By using large enough N blocks (so that the tools will not be able to anchor on both sides) we can be assured that the results of scattering and gathering the variants with the resulting interval list will be the same as calling with one large region.

Input

- A reference file to use for creating the intervals (needs to have index and dictionary next to it.) - Which type of intervals to emit in the output (Ns only, ACGT only or both.) - An integer indicating the largest number of Ns in a contiguous block that will be "tolerated" and not converted into an N block.

Output

- An interval list (with a SAM header) where the names of the intervals are labeled (either N-block or ACGT-block) to indicate what type of block they define.

Usage example

Create an interval list of intervals that do not contain any N blocks for use with haplotype caller on short reads

java -jar picard.jar ScatterIntervalsByNs \
      REFERENCE=reference_sequence.fasta \
      OUTPUT_TYPE=ACGT \
      OUTPUT=output.interval_list

Category Reference


Overview

A Tool for breaking up a reference into intervals of alternating regions of N and ACGT bases.

Summary

Used for creating a broken-up interval list that can be used for scattering a variant-calling pipeline in a way that will not cause problems at the edges of the intervals. By using large enough N blocks (so that the tools will not be able to anchor on both sides) we can be assured that the results of scattering and gathering the variants with the resulting interval list will be the same as calling with one large region.

Input

  • A reference file to use for creating the intervals
  • Which type of intervals to emit in the output (Ns only, ACGT only or both).
  • An integer indicating the largest number of Ns in a contiguous block that will be "tolerated" and not converted into an N block.

  • Output


    An interval list (with a SAM header) where the names of the intervals are labeled (either N-block or ACGT-block) to indicate what type of block they define.

    Usage example

    Create an interval list of intervals that do not contain any N blocks for use with haplotype caller on short reads

     java -jar picard.jar ScatterIntervalsByNs \
           R=reference_sequence.fasta \
           OT=BOTH \
           O=output.interval_list
     

    ScatterIntervalsByNs (Picard) specific arguments

    This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

    Argument name(s) Default value Summary
    Required Arguments
    --OUTPUT
     -O
    null Output file for interval list.
    --REFERENCE
     -R
    null Reference sequence to use. Note: this tool requires that the reference fasta has both an associated index and a dictionary.
    Optional Tool Arguments
    --arguments_file
    [] read one or more arguments files and add them to the command line
    --help
     -h
    false display the help message
    --MAX_TO_MERGE
     -N
    1 Maximal number of contiguous N bases to tolerate, thereby continuing the current ACGT interval.
    --OUTPUT_TYPE
     -OT
    BOTH Type of intervals to output.
    --version
    false display the version number for this tool
    Optional Common Arguments
    --COMPRESSION_LEVEL
    5 Compression level for all compressed files created (e.g. BAM and VCF).
    --CREATE_INDEX
    false Whether to create a BAM index when writing a coordinate-sorted BAM file.
    --CREATE_MD5_FILE
    false Whether to create an MD5 digest for any BAM or FASTQ files created.
    --GA4GH_CLIENT_SECRETS
    client_secrets.json Google Genomics API client_secrets.json file path.
    --MAX_RECORDS_IN_RAM
    500000 When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed.
    --QUIET
    false Whether to suppress job-summary info on System.err.
    --TMP_DIR
    [] One or more directories with space available to be used by this program for temporary storage of working files
    --USE_JDK_DEFLATER
     -use_jdk_deflater
    false Use the JDK Deflater instead of the Intel Deflater for writing compressed output
    --USE_JDK_INFLATER
     -use_jdk_inflater
    false Use the JDK Inflater instead of the Intel Inflater for reading compressed input
    --VALIDATION_STRINGENCY
    STRICT Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.
    --VERBOSITY
    INFO Control verbosity of logging.
    Advanced Arguments
    --showHidden
    false display hidden arguments

    Argument details

    Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


    --arguments_file / NA

    read one or more arguments files and add them to the command line

    List[File]  []


    --COMPRESSION_LEVEL / NA

    Compression level for all compressed files created (e.g. BAM and VCF).

    int  5  [ [ -∞  ∞ ] ]


    --CREATE_INDEX / NA

    Whether to create a BAM index when writing a coordinate-sorted BAM file.

    Boolean  false


    --CREATE_MD5_FILE / NA

    Whether to create an MD5 digest for any BAM or FASTQ files created.

    boolean  false


    --GA4GH_CLIENT_SECRETS / NA

    Google Genomics API client_secrets.json file path.

    String  client_secrets.json


    --help / -h

    display the help message

    boolean  false


    --MAX_RECORDS_IN_RAM / NA

    When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed.

    Integer  500000  [ [ -∞  ∞ ] ]


    --MAX_TO_MERGE / -N

    Maximal number of contiguous N bases to tolerate, thereby continuing the current ACGT interval.

    int  1  [ [ -∞  ∞ ] ]


    --OUTPUT / -O

    Output file for interval list.

    R File  null


    --OUTPUT_TYPE / -OT

    Type of intervals to output.

    The --OUTPUT_TYPE argument is an enumerated type (OutputType), which can have one of the following values:

    N
    ACGT
    BOTH

    OutputType  BOTH


    --QUIET / NA

    Whether to suppress job-summary info on System.err.

    Boolean  false


    --REFERENCE / -R

    Reference sequence to use. Note: this tool requires that the reference fasta has both an associated index and a dictionary.

    R File  null


    --showHidden / -showHidden

    display hidden arguments

    boolean  false


    --TMP_DIR / NA

    One or more directories with space available to be used by this program for temporary storage of working files

    List[File]  []


    --USE_JDK_DEFLATER / -use_jdk_deflater

    Use the JDK Deflater instead of the Intel Deflater for writing compressed output

    Boolean  false


    --USE_JDK_INFLATER / -use_jdk_inflater

    Use the JDK Inflater instead of the Intel Inflater for reading compressed input

    Boolean  false


    --VALIDATION_STRINGENCY / NA

    Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.

    The --VALIDATION_STRINGENCY argument is an enumerated type (ValidationStringency), which can have one of the following values:

    STRICT
    LENIENT
    SILENT

    ValidationStringency  STRICT


    --VERBOSITY / NA

    Control verbosity of logging.

    The --VERBOSITY argument is an enumerated type (LogLevel), which can have one of the following values:

    ERROR
    WARNING
    INFO
    DEBUG

    LogLevel  INFO


    --version / NA

    display the version number for this tool

    boolean  false


    Return to top


    See also General Documentation | Tool Docs Index Tool Documentation Index | Support Forum

    GATK version 4.0.11.0 built at 23-11-2018 02:11:49.