6.2 alleleFreq
Estimating population allele frequencies
alleleFreq
estimates the population allele frequencies from a multi-sample VCF file containing bi-allelic sites. Such a VCF file can be created with the ATLAS task majorMinor.
6.2.1 Input
Required inputs :
--vcf example_majorMinor.vcf.gz |
Input VCF file generated by ATLAS majorminor task. |
Optional inputs :
--samples alleleCounts_alleleCounts.txt.gz |
A zipped text file containing the MLE allele counts for all positions and populations. Used with --compare parameter. |
--samples samplesPopulations.txt |
A user-generated .txt file containing the samples to be used and their population affiliation. Different allele frequency will be estimated for different populations. |
Example text file:
sample1 1
sample2 1
sample5 2
sample8 2
Specific Parameters :
--likelihoods |
To write the sample allele frequency likelihoods to alleleFreqLKs file. Default = This file is not written. |
--compare |
|
--limitLines integer_value |
To limit amount of lines to be read from VCF file. Default = Will parse entire VCF. |
--regions \*.bed |
To limit analysis to regions defined in BED file. Default = Will parse entire VCF. |
--filterDepth integer1,integer2 |
To keep only the samples with indicated sample depth (inclusive). Default = Will keep sites regardless of depth. |
--maxMissing numeric_value |
To filter out sites which has more than the indicated data fraction missing. numeric_value must be between 0 and 1 (inclusive). Default = keep sites regardless of missingness. |
--minMAF numeric_value |
only store sites where initial estimate of allele frequency is larger or equal to minMAF. Default = 0.0 ( Will keep sites regardless of their minor allele frequency). |
--minVarQual numeric_value |
To only store sites with minimum variant quality as indicated or more. Default = Will keep sites regardless of their variant quality. |
--reportFreq integer_value |
Specify after how many lines the reading progress is printed to the terminal. Default = 10000. |
--epsF numeric_value |
epsilon for EM algorithm to estimate allele frequencies. Default = 0.0001 |
--chr or '--limitChr' |
To keep only specified chromosomes. Default = Will keep all chromosomes. |
Engine parameters that are common to all tasks can be found here.
6.2.2 Output
*_alleleFreq.txt.gz | A zipped text file containing the allele frequencies for all positions and populations. |
*_alleleFreqLikelihoods.txt.gz | zipped text file containing the allele frequency log likelihoods for all positions and populations. |
6.2.3 Usage Example
#! /bin/bash
. $(dirname $0)/find_atlas
. $(dirname $0)/simulate_vcf --sampleSize 2 --chrLength 1111 --fixedSeed 19
out="alleleFreq"
$atlas --task alleleFreq --vcf simulate.vcf.gz \
--fixedSeed 18 --out $out --logFile $out.out 2> $out.eout
out="alleleFreqLKs"
$atlas --task alleleFreq --likelihoods --vcf simulate.vcf.gz \
--fixedSeed 17 --out $out --logFile $out.out 2> $out.eout