5.12 thetaRatio

Estimate the ratio in heterozygosity (theta) between genomic regions

thetaRatio uses an MCMC (Metropolis-Hastings Algorithm) to estimate the ratio of the expected heterozygosity from sites in two different regions.

Φ=log(θ1θ2)

5.12.1 Input

Required inputs :

--bam Input_bam_file.bam Input bam file
--region1 region_1 --region2 region_2 Two files, region_1 and region_2 that define the regions of the two thetas.

Optional inputs :

--pmd "library_type:model_for_5'_read:model_for_3'_read" or --RGInfo Library type followed by the model to be used for the 5-prime read-end and the 3-prime read-end which can be either “Exponential” or “Empiric”. All three arguments must be provided as a string, divided by colons (:). e.g. : –pmdModels “doubleStrand:Exponential:Exponential”. Used to specify Post-mortem damage parameters. Can also be provided as a .txt file (see PMDfor generating such a file). Default = Will assume there is no PMD in the data.
--recal recal_model or --RGInfo A common recal model for all read groups.Used to specify Quality score recalibration parameters. Can also be provided as a .txt file (see recal for further information). Default = ‘-’, no recalibration is performed.

Specific Parameters :

  • See Filter parameters to apply specific filters for bases, reads and parsing window setting.

Engine parameters that are common to all tasks can be found here.

5.12.2 Output

*_RGInfo.json File containing read group info.
*_filterSummary.txt File containing filter summary of general filter counts.
*_thetaRatioMCMC.txt.gz File containing MCMC chain.

5.12.3 Usage Example

#! /bin/bash

# Set atlas path
atlas=$(dirname "$0")/../build/atlas

# Simulate a BAM File with specific theta
$atlas simulate --chrLength "10000,20000" --theta "0.001,0.002" --logFile simulate.out 

# Create region files
echo "chr1 1 10000" > region1.bed
echo "chr2 1 10000" > region2.bed

# 
$atlas thetaRatio --bam ATLAS_simulations.bam --region1 region1.bed --region2 region2.bed --logFile theta.out

5.12.4 Additional Information

Method

The likelihood function is:

P(d1,d2|θ1,θ2,π1,π2)=i=1Igj=1niP(d1ij|gi=g)P(gi=g|θ1,π1)i=1Igj=1niP(d2ij|gi=g)P(gi=g|θ2,π2) An MCMC is used to infer the posterior distribution for all parameters. Updates are performed for all: (π) and (log(θ1)) and (log(θ2)).

U[0,1] prior is used for (log(θ1)) and (log(θ1)) and N(0,1). is used for (π).