5.9 PSMC

Generating a PSMC Input file probabilistically

PSMC generates an input file for PSMC (pairwise sequentially Markovian coalescent). This task takes an input bam file, where the possible letters are T=Homozygous, K=Heterozygous, N=unknown. Each letter represents a window of 100bp of a genome, and windows with zero heterozygous sites is defined as a “T” and windows with at least 1 heterozygous site as a “K”. In order for ATLAS to be able to produce such a file, two further things need to be defined:a prior on theta, which is the heterozygosity you expect to see a priori, and a confidence threshold. For each window, ATLAS calculates the posterior probability of it being a “K” or a “T”. If neither of these probabilities is higher than the confidence threshold, the window is defined as an “N”.

5.9.1 Input

Required inputs :

--bam Input_bam_file.bam Input bam file.

Optional inputs :

--pmd Input_PMD.txt Post-mortem damage parameters (see PMD for generating such a file).
--recal recal.txt Quality score recalibration file (see recal for further information).
--filterBaseQual [integer_1,integer_2] To filter out bases with quality outside the range [integer_1,integer_2]. Default = [1,93].
--ignoreContexts To filter out bases based on context. Default = keep bases regardless of base context.

Specific Parameters :

--theta Input_bam_file.bam Prior for heterozygosity. Default = 0.001.
--confidence float_value Confidence threshold for assigning a window with a “T” or a “K”. Default = 0.99.
--window float/integer_value Windows must be at least as large as the max read length. Default = 100000bp.
--minMAF numeric_value To keep only sites for which minor allele frequency is at the least the indicated number. Default = all sites are kept regardless of minor allele frequency.

Engine parameters that are common to all tasks can be found here.

5.9.2 Output

*.psmcfa PSMC input file.

5.9.3 Usage Example

#! /bin/bash

# `--fixedSeed = N` is needed to have reproducable results in regression test

. $(dirname $0)/find_atlas
. $(dirname $0)/simulate --fixedSeed 2

out="PSMC"
$atlas --task PSMC --bam simulate.bam --window 4567 \
       --fixedSeed 1 --out $out --logFile $out.out 2> $out.eout