5.6 HKY85

Estimating HKY85 genotype Distribution

HKY85 infers the genotype Distribution using the HKY85 substitution model (Hasegawa et al. 1985).

5.6.1 Input

Required inputs :

--bam Input_bam_file.bam Input bam file.
--fasta Input_refrence_genome_file.fasta Reference genome.

Optional inputs :

--pmd "library_type:model_for_5'_read:model_for_3'_read" or --RGInfo Library type followed by the model to be used for the 5-prime read-end and the 3-prime read-end which can be either “Exponential” or “Empiric”. All three arguments must be provided as a string, divided by colons (:). e.g. : –pmdModels “doubleStrand:Exponential:Exponential”. Used to specify Post-mortem damage parameters. Can also be provided as a .txt file (see PMDfor generating such a file). Default = Will assume there is no PMD in the data.
--recal recal_model or --RGInfo A common recal model for all read groups. Used to specify Quality score recalibration parameters. Can also be provided as a .txt file (see recal for further information). Default = ‘-’, no recalibration is performed.

Specific Parameters :

  • See Filter parameters to apply specific filters for bases, reads and parsing window setting.

Engine parameters that are common to all tasks can be found here.

5.6.2 Output

5.6.3 Usage Example

#! /bin/bash

. $(dirname $0)/find_atlas

recal="intercept[0.1];quality:polynomial[0.9,0.01];position:polynomial[0.01]"
pmd="CT5:0.1*exp(-0.2*p)+0.001;GA3:0.1*exp(-0.2*p)+0.001"

. $(dirname $0)/simulate --chrLength 121234,121098 --ploidy 2,1 --depth 11,9 \
    --type "HKY85" --mu 0.55 --thetaG 0.00033 --thetaR 0.015 --fixedSeed 111 \
    --recal $recal --pmd $pmd --baseQuality "categorical(12,22,27,32,37,41)" \

echo "chr1 0 5" > diplo.bed
echo "chr2 5 10" > haplo.bed
tstep=$(date +%s)
for i in {1..120000}; do
    echo "chr1 ${i}0 ${i}5" >> diplo.bed
    echo "chr2 ${i}5 $((i+1))0" >> haplo.bed
done
echo "Bed creation: $(($(date +%s) - $tstep)) seconds"

probs="0.2"
out=windows
$atlas --task HKY85 --prob "1.0,$probs"  --minDeltaLL 1 \
       --bam simulate.bam --fasta simulate.fasta --chr chr1 \
       --recal $recal --pmd $pmd --window 65432 --doPMD \
       --fixedSeed 222 --out $out --logFile $out.out 2> $out.eout

probs="0.5,0.2,0.1"
out=diploRaw
$atlas --task HKY85 --genomeWide 3 --prob "$probs" --minDeltaLL 0.1 \
       --bam simulate.bam --fasta simulate.fasta \
       --regions diplo.bed --ploidy 2 --sample bases \
       --fixedSeed 333 --out $out --logFile $out.out 2> $out.eout

out=diploEE
$atlas --task HKY85 --minDeltaLL 0.1 --genomeWide 2 \
       --prob "$probs" --sample bases \
       --bam simulate.bam --fasta simulate.fasta \
       --regions diplo.bed --ploidy 2 --recal $recal --pmd $pmd \
       --fixedSeed 444 --out $out --logFile $out.out 2> $out.eout

# no downsampling
out=haplo
$atlas --task HKY85 --genomeWide --minDeltaLL 0.1 \
       --bam simulate.bam --fasta simulate.fasta \
       --regions haplo.bed --ploidy 1 --recal $recal --pmd $pmd --doPMD \
       --fixedSeed 555 --out $out --logFile $out.out 2> $out.eout