9.2 geno
The geno format is used in the LEA R-package. It represents the genotypes as a matrix with loci as rows and individuals as columns (see example below). The genotype is encoded as the number of copies of the reference allele with 0, 1 and 2 as homozygous alternative, heterozygous and homozygous reference (note, that in the VCF file, the genotype is represented as the number of copies of the alternative allele, so the genotypes will be exactly flipped). Missing data is encoded by the number 9. Note that this task ignores any genotype likelihoods from the VCF file and it will only read the called genotype. Since the information about the chromosome and position of each SNP is lost in the geno file, an additional file that ends with “.geno.kept_loci” is created that lists the names of all loci contained in the geno file. This is useful since certain sites might be filtered out while reading the VCF.
This is an example of a file in geno format for 4 loci and 15 individuals:
112112120000121 |
010120902122900 |
091112900192000 |
121111290002122 |