6.1 alleleCounts

Estimating population allele counts

This task estimates the most likely allele count from a multi-sample VCF containing bi-allelic sites. Such a VCF file can be created with the ATLAS task majorMinor. The method is based on Nielsen et al. (2012) PLoS One.

Input

Required inputs :

--vcf example_majorMinor.vcf.gz Input VCF file generated by ATLAS majorminor task.

Optional inputs :

--countsFile alleleCounts_alleleCounts.txt.gz A zipped text file containing the MLE allele counts for all positions and populations. Used with --transform parameter.
--samples samplesPopulations.txt A user-generated .txt file containing the samples to be used and their population affiliation. Different allele counts will be estimated for different populations.

Example text file:

sample1 1

sample2 1

sample5 2

sample8 2

Specific Parameters :

--dosaf To write the sample allele count likelihoods to alleleCountsLKs file. Default = This file is not written.
--transform
--limitLines integer_value To limit amount of lines to be read from VCF file. Default = Will parse entire VCF.
--regions \*.bed To limit analysis to regions defined in BED file. Default = Will parse entire VCF.
--filterDepth integer1,integer2 To keep only the samples with indicated sample depth (inclusive). Default = Will keep sites regardless of depth.
--minMAF numeric_value only store sites where initial estimate of allele frequency is larger or equal to minMAF. Default = 0.0 ( Will keep sites regardless of their minor allele frequency).
--minVarQual numeric_value To only store sites with minimum variant quality as indicated or more. Default = Will keep sites regardless of their variant quality.
--reportFreq integer_value Specify after how many lines the reading progress is printed to the terminal. Default = 10000.
--chr or '--limitChr' To keep only specified chromosomes. Default = Will keep all chromosomes.

Engine parameters that are common to all tasks can be found here.

Output

*_alleleCounts.txt.gz A zipped text file containing the MLE allele counts for all positions and populations.
*_alleleCountsLKs.txt.gz A zipped text file containing the log likelihoods of MLE allele counts for all positions and populations.

Usage Example

#! /bin/bash

. $(dirname $0)/find_atlas
. $(dirname $0)/simulate_vcf --sampleSize 2

$atlas --task alleleCounts --vcf ATLAS_simulations.vcf.gz --fixedSeed 0 --out alleleCounts --logFile alleleCounts.out
$atlas --task alleleCounts --dosaf --vcf ATLAS_simulations.vcf.gz --fixedSeed 0 --out alleleCountsSAF --logFile alleleCountsSAF.out
$atlas --task alleleCounts --transform alleleCounts_alleleCounts.txt.gz --fixedSeed 0 --out transform --logFile transform.out