6.1 alleleCounts
Estimating population allele counts
This task estimates the most likely allele count from a multi-sample VCF containing bi-allelic sites. Such a VCF file can be created with the ATLAS task majorMinor. The method is based on Nielsen et al. (2012) PLoS One.
Input
Required inputs :
--vcf example_majorMinor.vcf.gz |
Input VCF file generated by ATLAS majorminor task. |
Optional inputs :
--countsFile alleleCounts_alleleCounts.txt.gz |
A zipped text file containing the MLE allele counts for all positions and populations. Used with --transform parameter. |
--samples samplesPopulations.txt |
A user-generated .txt file containing the samples to be used and their population affiliation. Different allele counts will be estimated for different populations. |
Example text file:
sample1 1
sample2 1
sample5 2
sample8 2
Specific Parameters :
--dosaf |
To write the sample allele count likelihoods to alleleCountsLKs file. Default = This file is not written. |
--transform |
|
--limitLines integer_value |
To limit amount of lines to be read from VCF file. Default = Will parse entire VCF. |
--regions \*.bed |
To limit analysis to regions defined in BED file. Default = Will parse entire VCF. |
--filterDepth integer1,integer2 |
To keep only the samples with indicated sample depth (inclusive). Default = Will keep sites regardless of depth. |
--minMAF numeric_value |
only store sites where initial estimate of allele frequency is larger or equal to minMAF. Default = 0.0 ( Will keep sites regardless of their minor allele frequency). |
--minVarQual numeric_value |
To only store sites with minimum variant quality as indicated or more. Default = Will keep sites regardless of their variant quality. |
--reportFreq integer_value |
Specify after how many lines the reading progress is printed to the terminal. Default = 10000. |
--chr or '--limitChr' |
To keep only specified chromosomes. Default = Will keep all chromosomes. |
Engine parameters that are common to all tasks can be found here.
Output
*_alleleCounts.txt.gz | A zipped text file containing the MLE allele counts for all positions and populations. |
*_alleleCountsLKs.txt.gz | A zipped text file containing the log likelihoods of MLE allele counts for all positions and populations. |
Usage Example
#! /bin/bash
. $(dirname $0)/find_atlas
. $(dirname $0)/simulate_vcf --sampleSize 2
$atlas --task alleleCounts --vcf ATLAS_simulations.vcf.gz --fixedSeed 0 --out alleleCounts --logFile alleleCounts.out
$atlas --task alleleCounts --dosaf --vcf ATLAS_simulations.vcf.gz --fixedSeed 0 --out alleleCountsSAF --logFile alleleCountsSAF.out
$atlas --task alleleCounts --transform alleleCounts_alleleCounts.txt.gz --fixedSeed 0 --out transform --logFile transform.out