5.9 PSMC
Generating a PSMC Input file probabilistically
PSMC
generates an input file for PSMC (pairwise sequentially Markovian coalescent). This task takes an input bam file, where the possible letters are T=Homozygous, K=Heterozygous, N=unknown. Each letter represents a window of 100bp of a genome, and windows with zero heterozygous sites is defined as a “T” and windows with at least 1 heterozygous site as a “K”. In order for ATLAS to be able to produce such a file, two further things need to be defined:a prior on theta, which is the heterozygosity you expect to see a priori, and a confidence threshold. For each window, ATLAS calculates the posterior probability of it being a “K” or a “T”. If neither of these probabilities is higher than the confidence threshold, the window is defined as an “N”.
5.9.1 Input
Required inputs :
--bam Input_bam_file.bam |
Input bam file. |
Optional inputs :
--pmd Input_PMD.txt |
Post-mortem damage parameters (see PMD for generating such a file). |
--recal recal.txt |
Quality score recalibration file (see recal for further information). |
--filterBaseQual [integer_1,integer_2] |
To filter out bases with quality outside the range [integer_1,integer_2]. Default = [1,93]. |
--ignoreContexts |
To filter out bases based on context. Default = keep bases regardless of base context. |
Specific Parameters :
--theta Input_bam_file.bam |
Prior for heterozygosity. Default = 0.001. |
--confidence float_value |
Confidence threshold for assigning a window with a “T” or a “K”. Default = 0.99. |
--window float/integer_value |
Windows must be at least as large as the max read length. Default = 100000bp. |
--minMAF numeric_value |
To keep only sites for which minor allele frequency is at the least the indicated number. Default = all sites are kept regardless of minor allele frequency. |
Engine parameters that are common to all tasks can be found here.