6.2 alleleFreq

Estimating population allele frequencies

alleleFreq estimates the population allele frequencies from a multi-sample VCF file containing bi-allelic sites. Such a VCF file can be created with the ATLAS task majorMinor.

6.2.1 Input

Required inputs :

--vcf example_majorMinor.vcf.gz Input VCF file generated by ATLAS majorminor task.

Optional inputs :

--samples alleleCounts_alleleCounts.txt.gz A zipped text file containing the MLE allele counts for all positions and populations. Used with --compare parameter.
--samples samplesPopulations.txt A user-generated .txt file containing the samples to be used and their population affiliation. Different allele frequency will be estimated for different populations.

Example text file:

sample1 1

sample2 1

sample5 2

sample8 2

Specific Parameters :

--likelihoods To write the sample allele frequency likelihoods to alleleFreqLKs file. Default = This file is not written.
--compare
--limitLines integer_value To limit amount of lines to be read from VCF file. Default = Will parse entire VCF.
--regions \*.bed To limit analysis to regions defined in BED file. Default = Will parse entire VCF.
--filterDepth integer1,integer2 To keep only the samples with indicated sample depth (inclusive). Default = Will keep sites regardless of depth.
--maxMissing numeric_value To filter out sites which has more than the indicated data fraction missing. numeric_value must be between 0 and 1 (inclusive). Default = keep sites regardless of missingness.
--minMAF numeric_value only store sites where initial estimate of allele frequency is larger or equal to minMAF. Default = 0.0 ( Will keep sites regardless of their minor allele frequency).
--minVarQual numeric_value To only store sites with minimum variant quality as indicated or more. Default = Will keep sites regardless of their variant quality.
--reportFreq integer_value Specify after how many lines the reading progress is printed to the terminal. Default = 10000.
--epsF numeric_value epsilon for EM algorithm to estimate allele frequencies. Default = 0.0001
--chr or '--limitChr' To keep only specified chromosomes. Default = Will keep all chromosomes.

Engine parameters that are common to all tasks can be found here.

6.2.2 Output

*_alleleFreq.txt.gz A zipped text file containing the allele frequencies for all positions and populations.
*_alleleFreqLikelihoods.txt.gz zipped text file containing the allele frequency log likelihoods for all positions and populations.

6.2.3 Usage Example

#! /bin/bash

. $(dirname $0)/find_atlas
. $(dirname $0)/simulate_vcf --sampleSize 2 --chrLength 1111 --fixedSeed 19

out="alleleFreq"
$atlas --task alleleFreq --vcf simulate.vcf.gz \
       --fixedSeed 18 --out $out --logFile $out.out 2> $out.eout

out="alleleFreqLKs"
$atlas --task alleleFreq --likelihoods --vcf simulate.vcf.gz \
       --fixedSeed 17 --out $out --logFile $out.out 2> $out.eout