5.6 summaryStats

Summary statistics per window/genomewide: Felsenstein, HKY85, Pi

summaryStats does summary statistics per window or genometools, and can estimate the genotype Distribution using both the Felsenstein and HKY85 (Hasegawa et al. 1985) substitution models as well as the Pi-distribution.

5.6.1 Input

Required inputs :

--bam Input_bam_file.bam Input bam file.
--fasta Input_refrence_genome_file.fasta Reference genome.

Optional inputs :

--pmd "library_type:model_for_5'_read:model_for_3'_read" or --RGInfo Library type followed by the model to be used for the 5-prime read-end and the 3-prime read-end which can be either “Exponential” or “Empiric”. All three arguments must be provided as a string, divided by colons (:). e.g. : –pmdModels “doubleStrand:Exponential:Exponential”. Used to specify Post-mortem damage parameters. Can also be provided as a .txt file (see PMDfor generating such a file). Default = Will assume there is no PMD in the data.
--recal recal_model or --RGInfo A common recal model for all read groups. Used to specify Quality score recalibration parameters. Can also be provided as a .txt file (see recal for further information). Default = ‘-’, no recalibration is performed.

Specific Parameters :

  • See Filter parameters to apply specific filters for bases, reads and parsing window setting.

Engine parameters that are common to all tasks can be found here.

5.6.2 Output

5.6.3 Usage Example

#! /bin/bash

# Set atlas path
atlas=$(dirname "$0")/../build/atlas

# Simulate a BAM File with specific HKY85 genotype distribution
$atlas simulate --type "HKY85" --mu 0.71 --thetaG 0.0002 --thetaR 0.003 --logFile simulate.out 

# Create window-by window summary stats, including estimated HKY85 model
$atlas summaryStats --bam ATLAS_simulations.bam --fasta ATLAS_simulations.fasta --logFile summaryStats.out