5.2 call
Calling genotypes
call
generates a VCF file that is used to encode genetic variant sites and genotypes. See VCFCompare for more information on VCF file format.call
can make use of different callers to produce a VCF file. Some are maximum likelihood callers, like the “MLE” caller, and others are Bayesian callers, like “Bayesian” and “allelePresence”. For the Bayesian callers, we define the prior based on theta
and the allele frequencies (see below for more details). The parameters of the priors can either be estimated in a first step and then in a second step assumed to be known when making the genotype call, or they can be fixed to a value.
5.2.1 Input
Required inputs :
--bam Input_bam_file.bam |
Input bam file |
--fasta Input_refrence_genome_file.fasta |
reference genome FASTA file |
Optional inputs :
--pmd "library_type:model_for_5'_and_or_3'_read" |
library type followed by the model to be used in case of single stranded library or library type followed by the model to be used for the 5-prime read-end and the 3-prime read-end in case of double stranded library. Model can be either “Exponential” or “Empiric”.All arguments must be provided as a string, divided by colons (:). e.g. : --pmdModels "doubleStrand:Exponential:Exponential" |
--recal recalibaration_parameters_file.txt |
A .txt file with suffix “_recalibrationEM.txt” specifying the recalibration parameters for all covariates and readgroups. Default = default rho is used (can be changed using --rho ) |
--alleles alleles_of_interest.txt |
Provide a file that specifies the sites for which variants should be called and the known alleles at those sites. Provide file with known alleles (1-based!) with the following tab-separated format: chr position ref_allele alt_allele |
Specific Parameters :
--method caller_of_choice_for_variant calling |
caller type for variant calling. Following options are available : MLE, Bayesian, allelePresence, randomBase, majorityBase |
--priors prior_type |
Prior for the callers allelePresence and Bayesian. Following options are available for prior type: --fixedTheta integer_value : provide fixed theta value instead of estimating it for every window. --equalBaseFreqs : assume all base frequencies to be 0.25 instead of estimating them for every window. defaultTheta integer_value : provide a fixed theta value to be used for all windows for which theta can not be estimated due to lack of data (algorithm does not converge). Default = theta and base frequencies estimated individually for each window. |
--infoFields DP |
redundant?? |
--formatFields format_fields |
To print the VCF format fields that are specified. Available options are: GT: genotype string, DP: sequencing depth, GQ: genotype quality, AD: allelic depths for all alleles in call in order listed, AP: Phred-scaled allelic posterior probabilities for the four alleles A, C, G and T, GL: normalized genotype likelihoods, PL: phred-scaled normalized genotype likelihoods, GP: Genotype posterior probabilities (phred-scaled), AB: alleleic imbalance, AI: Binomial probability of allelic imbalance if Hz site. All arguments must be provided as a string, separated by comma (,). e.g. : --formatFields GT,DP,AD, GQ, PL . Default = only GT,DP,AD, GQ, PL are printed. |
--printAll |
To print all sites, also invariant ones. Default = Will print only sites with data. |
--noAltIfHomoRef |
To specify to not print alternative alleles if call is homozygous reference, but still use them to calculate call quality. Default = Will print the most likely alternative allele even if the call is homozygous reference |
--noTriallelic |
To only allow one alternative allele. Default = Will allow for genotypes with two alternative alleles. |
--noCallsViolatingBest |
To not call genotypes from known alleles that conflict with best call across all genotypes. Default = Will call genotypes from known alleles even if they differ from best call across all genotypes. |
--sampleName name |
To define a sample name for the header of the vcf file. Default = prefix specified with the out parameter. |
--alleles |
To limit calls to sites with known alleles. Default = Will call without prior knowledge on alleles. |
- See Filter parameters to apply specific filters for bases, reads and parsing window setting.
Engine parameters that are common to all tasks can be found here.