6.7 majorMinor
Estimating major and minor alles
majorMinor
infers the major and minor alleles from a population sample and outputs the genotype likelihoods in a vcf file. This task requires the sample-specific genotype likelihoods in glf format, which can be created with the ATLAS task GLF
. The resulting vcf file can be used as an input to ANGSD.
The major and minor alleles can be estimated using the method described by Skotte et al. (2012) or using the MLE method. The MLE method estimates the genotype frequencies simultaneously with the two alleles present at a site. The variant quality is the likelihood ratio of a model with variants and a model without variants.
6.7.1 Input
Required inputs :
--glf glf_file1.glf.gz,glf_file2.glf. or --glf glf_file.txt |
Input glf files for every sample of the population. Can be provided on the command line or with an input text file (one file name per line). |
Example text file:
glf_file_1.glf.gz
glf_file_2.glf.gz
glf_file_3.glf.gz
glf_file_4.glf.gz
Optional inputs :
--sampleNames sample_name1,sample_name2 |
Provide alternative sample names.Number of provided sample names needs to match number of GLF files. Default= will deduce sample names from GLF file names. |
Specific Parameters :
--method method_name |
Estimates major/minor alleles using the indicated method. Two options available Skotte and MLE . Default = MLE. |
--maxF numeric_value |
maximum value of the likelihood function with respect to the parameter theta . Defualt = 1e-07. |
--phredLik |
To transform the likelihood onto the Phred quality score scale.This will save space but lead to loss of precision and thus power. Default = raw likelhood without any adjustment/transformation. |
--minSamplesWithData integer_value |
To keep only sites for which at least 1indicated number of samples have data. Default = 1. |
--minMAF numeric_value |
To keep only sites for which minor allele frequency is at the least the indicated number. Default = all sites are kept regardless of minor allele frequency. |
--limitSites integer_value |
To write likelihoods only up to the indicated input position. Default = disabled. |
Engine parameters that are common to all tasks can be found here.
6.7.2 Output
*_majorMinor.vcf.gz | One multi-sample VCF-file, containing the likelihoods of the genotypes consisting of the major and minor allele. |
6.7.3 Usage Example
#! /bin/bash
# Set atlas path
atlas=$(dirname "$0")/../build/atlas
# Simulate 5 BAM files in Hardy–Weinberg equilibrium
$atlas simulate --type HW --sampleSize 5 --logFile simulate.out
# Create GLF files
for f in *.bam; do
$atlas GLF --bam $f
done
samples=$(ls -1 *.glf.gz | paste -s -d ',' -)
# Estimate major and minor allele
$atlas majorMinor --glf $samples --fasta ATLAS_simulations.fasta --logFile majorMinor.out