9.3 LFMM

The LFMM format is used in the LFMM and LFMM2 R-packages. The LFMM format is transposed compared to the VCF format, as the individuals are represented as rows and the loci as columns. Since the information about the chromosome and position of each SNP is lost in the LFMM file, an additional file that ends with “.lfmm.kept_loci” is created that lists the names of all loci contained in the LFMM file. This is useful since certain sites might be filtered out while reading the VCF.

The command line parameter --genotypes specifies the genotype format to be used: call or posterior. For call, the called genotypes are used, defined as the number of copies of the alternative allele. Missing data is encoded by 9. Here is an example for three individuals and four loci:

1 0 0 1
1 1 9 2
2 0 1 1

For posterior, the mean posterior genotypes are used. The mean posterior genotype is the weighted sum over the genotype likelihoods: \[E[P(g|D)] =\frac{(P(D|g =1)+2*P(D|g=2))}{\sum_g P(D|g)}\] (assuming an uniform prior on \(P(g)\). Missing data is not allowed. Therefore, the VCF file should be imputed before conversion, or the option minSamplesWithData should be set to the total number of samples in order to remove loci with missing data. Here is an example for three individuals and four loci:

0.80057 0.0234 0.16409 1.07945
0.920549 0.947835 1.15216 1.9848
1.8277 0.020549 0.994783 1.07945