9.1 Beagle
The Beagle format was originally used for the program Beagle, and is required as an input file for various tasks in ANGSD [1].
The first three columns specify the position, the reference and the alternative allele, followed by three columns per individual that contain the genotype likelihood for each of the three genotypes. Here is an example for four loci and three individuals:
This portion of the text will have a font size of 10 pixels.
marker | allele1 | allele12 | Ind0 | Ind0 | Ind0 | Ind1 | Ind1 | Ind1 | Ind2 | Ind2 | Ind2 |
chr1_1 | A | C | 0.941177 | 0.058822 | 0.000001 | 0.799685 | 0.199918 | 0.000397 | 0.666316 | 0.333155 | |
chr1_2 | G | T | 0.709983 | 0.177493 | 0.112525 | 0.941178 | 0.058822 | 0.000000 | 0.665554 | 0.332774 | 0.001672 |
chr1_3 | C | A | 0.855993 | 0.106996 | 0.037010 | 0.333333 | 0.333333 | 0.333333 | 0.799971 | 0.333155 | |
chr1_5 | T | A | 0.835380 | 0.104420 | 0.060201 | 0.799685 | 0.199918 | 0.000397 | 0.333333 | 0.333333 |
Because ANGSD requires the genotype likelihoods per individual to sum to one, ATLAS normalizes them accordingly. In case of haploid genotypes, ATLAS still uses three columns per individual, but set the third genotype likelihood (for the homozygous alternative genotype) to zero. This is required for certain downstream tools, e.g. FastNGSAdmix. For other applications, this might not make sense, so Beagle files for haploid chromosomes should be handled with care.