Checks the sample identity of the sequence/genotype data in the provided file (SAM/BAM or VCF) against a set of known genotypes in the supplied genotype file (in VCF format).
java -jar picard.jar CheckFingerprint \ INPUT=sample.bam \ GENOTYPES=sample_genotypes.vcf \ HAPLOTYPE_DATABASE=fingerprinting_haplotype_database.txt \ OUTPUT=sample_fingerprinting
java -jar picard.jar CheckFingerprint \ INPUT=sample.bam \ GENOTYPES=sample_genotypes.vcf \ HAPLOTYPE_DATABASE=fingerprinting_haplotype_database.txt \ OUTPUT=sample_fingerprinting
This tool calculates a single number that reports the LOD score for identity check between the #INPUT and the #GENOTYPES. A positive value indicates that the data seems to have come from the same individual or, in other words the identity checks out. The scale is logarithmic (base 10), so a LOD of 6 indicates that it is 1,000,000 more likely that the data matches the genotypes than not. A negative value indicates that the data do not match. A score that is near zero is inconclusive and can result from low coverage or non-informative genotypes.
The identity check makes use of haplotype blocks defined in the #HAPLOTYPE_MAP file to enable it to have higher statistical power for detecting identity or swap by aggregating data from several SNPs in the haplotype block. This enables an identity check of samples with very low coverage (e.g. ~1x mean coverage).
When provided a VCF, the identity check looks at the PL, GL and GT fields (in that order) and uses the first one that it finds.
This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.
Argument name(s) | Default value | Summary | |
---|---|---|---|
Required Arguments | |||
--DETAIL_OUTPUT -D |
null | The text file to which to write detail metrics. | |
--GENOTYPES -G |
null | File of genotypes (VCF) to be used in comparison. May contain any number of genotypes; CheckFingerprint will use only those that are usable for fingerprinting. | |
--HAPLOTYPE_MAP -H |
null | The file lists a set of SNPs, optionally arranged in high-LD blocks, to be used for fingerprinting. See https://software.broadinstitute.org/gatk/documentation/article?id=9526 for details. | |
--INPUT -I |
null | Input file SAM/BAM or VCF. If a VCF is used, it must have at least one sample. If there are more than one samples in the VCF, the parameter OBSERVED_SAMPLE_ALIAS must be provided in order to indicate which sample's data to use. If there are no samples in the VCF, an exception will be thrown. | |
--OUTPUT -O |
null | The base prefix of output files to write. The summary metrics will have the file extension 'fingerprinting_summary_metrics' and the detail metrics will have the extension 'fingerprinting_detail_metrics'. | |
--SUMMARY_OUTPUT -S |
null | The text file to which to write summary metrics. | |
Optional Tool Arguments | |||
--arguments_file |
[] | read one or more arguments files and add them to the command line | |
--EXPECTED_SAMPLE_ALIAS -SAMPLE_ALIAS |
null | This parameter can be used to specify which sample's genotypes to use from the expected VCF file (the GENOTYPES file). If it is not supplied, the sample name from the input (VCF or BAM read group header) will be used. | |
--GENOTYPE_LOD_THRESHOLD -LOD |
5.0 | When counting haplotypes checked and matching, count only haplotypes where the most likely haplotype achieves at least this LOD. | |
--help -h |
false | display the help message | |
--IGNORE_READ_GROUPS -IGNORE_RG |
false | If the input is a SAM/BAM, and this parameter is true, treat the entire input BAM as one single read group in the calculation, ignoring RG annotations, and producing a single fingerprint metric for the entire BAM. | |
--OBSERVED_SAMPLE_ALIAS |
null | If the input is a VCF, this parameters used to select which sample's data in the VCF to use. | |
--version |
false | display the version number for this tool | |
Optional Common Arguments | |||
--COMPRESSION_LEVEL |
5 | Compression level for all compressed files created (e.g. BAM and VCF). | |
--CREATE_INDEX |
false | Whether to create a BAM index when writing a coordinate-sorted BAM file. | |
--CREATE_MD5_FILE |
false | Whether to create an MD5 digest for any BAM or FASTQ files created. | |
--GA4GH_CLIENT_SECRETS |
client_secrets.json | Google Genomics API client_secrets.json file path. | |
--MAX_RECORDS_IN_RAM |
500000 | When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed. | |
--QUIET |
false | Whether to suppress job-summary info on System.err. | |
--REFERENCE_SEQUENCE -R |
null | Reference sequence file. | |
--TMP_DIR |
[] | One or more directories with space available to be used by this program for temporary storage of working files | |
--USE_JDK_DEFLATER -use_jdk_deflater |
false | Use the JDK Deflater instead of the Intel Deflater for writing compressed output | |
--USE_JDK_INFLATER -use_jdk_inflater |
false | Use the JDK Inflater instead of the Intel Inflater for reading compressed input | |
--VALIDATION_STRINGENCY |
STRICT | Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. | |
--VERBOSITY |
INFO | Control verbosity of logging. | |
Advanced Arguments | |||
--showHidden |
false | display hidden arguments |
Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.
read one or more arguments files and add them to the command line
List[File] []
Compression level for all compressed files created (e.g. BAM and VCF).
int 5 [ [ -∞ ∞ ] ]
Whether to create a BAM index when writing a coordinate-sorted BAM file.
Boolean false
Whether to create an MD5 digest for any BAM or FASTQ files created.
boolean false
The text file to which to write detail metrics.
Exclusion: This argument cannot be used at the same time as OUTPUT
.
R File null
This parameter can be used to specify which sample's genotypes to use from the expected VCF file (the GENOTYPES file). If it is not supplied, the sample name from the input (VCF or BAM read group header) will be used.
String null
Google Genomics API client_secrets.json file path.
String client_secrets.json
When counting haplotypes checked and matching, count only haplotypes where the most likely haplotype achieves at least this LOD.
double 5.0 [ [ -∞ ∞ ] ]
File of genotypes (VCF) to be used in comparison. May contain any number of genotypes; CheckFingerprint will use only those that are usable for fingerprinting.
R String null
The file lists a set of SNPs, optionally arranged in high-LD blocks, to be used for fingerprinting. See https://software.broadinstitute.org/gatk/documentation/article?id=9526 for details.
R File null
display the help message
boolean false
If the input is a SAM/BAM, and this parameter is true, treat the entire input BAM as one single read group in the calculation, ignoring RG annotations, and producing a single fingerprint metric for the entire BAM.
boolean false
Input file SAM/BAM or VCF. If a VCF is used, it must have at least one sample. If there are more than one samples in the VCF, the parameter OBSERVED_SAMPLE_ALIAS must be provided in order to indicate which sample's data to use. If there are no samples in the VCF, an exception will be thrown.
R String null
When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed.
Integer 500000 [ [ -∞ ∞ ] ]
If the input is a VCF, this parameters used to select which sample's data in the VCF to use.
String null
The base prefix of output files to write. The summary metrics will have the file extension 'fingerprinting_summary_metrics' and the detail metrics will have the extension 'fingerprinting_detail_metrics'.
Exclusion: This argument cannot be used at the same time as SUMMARY_OUTPUT, DETAIL_OUTPUT, S, D
.
R String null
Whether to suppress job-summary info on System.err.
Boolean false
Reference sequence file.
File null
display hidden arguments
boolean false
The text file to which to write summary metrics.
Exclusion: This argument cannot be used at the same time as OUTPUT
.
R File null
One or more directories with space available to be used by this program for temporary storage of working files
List[File] []
Use the JDK Deflater instead of the Intel Deflater for writing compressed output
Boolean false
Use the JDK Inflater instead of the Intel Inflater for reading compressed input
Boolean false
Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.
The --VALIDATION_STRINGENCY argument is an enumerated type (ValidationStringency), which can have one of the following values:
ValidationStringency STRICT
Control verbosity of logging.
The --VERBOSITY argument is an enumerated type (LogLevel), which can have one of the following values:
LogLevel INFO
display the version number for this tool
boolean false
See also General Documentation | Tool Docs Index Tool Documentation Index | Support Forum
GATK version 4.0.11.0 built at 23-11-2018 02:11:49.