2.1 Low-depth sequencing
There are multiple reasons to perform so-called low-depth sequencing (with an average sequencing depth often far below 10X). While sometimes it is due to sample accessibility (not enough material available, see also Ancient DNA), low-depth sequencing can also be highly beneficial in cases where questions need to be answered on a population level instead of an individual level. For these research questions, sequencing more samples at a lower depth instead of fewer samples at a high depth gives the opportunity to increase the knowledge on allele frequencies within and between populations.
BAM file generation
It is important, that all your BAM files are aligned and filtered the same way to avoid batch effects and reference biases. We further highly recommend performing a local In-Del realignment (GATK 3.5) and to merge the mates in paired-end read groups to avoid overlapping read-ends to be counted twice in your analysis. For all steps from sequencing data to population genetic analysis, we provide our in-house [Pipeline].
Genotype likelihoods
As with a lower sequencing depth, sources of errors such as sequencing errors and wrongly mapped reads decrease the accuracy of genotype calls and lead to biases in downstream analysis (e.g. underestimating genetic diversity), we advise inferring genotype likelihoods instead. Genotype likelihoods are assessed for all possible true genotypes while accounting for genotyping uncertainty. (For more information on how we access genotype likelihoods, see GLF).
Base quality recalibration
However, genotype likelihoods rely heavily on the assumption that error rates are correct. These are reported as base qualities by the sequencing machine. Most of the time, these reported base qualities are inaccurate and must be recalibrated. While there are existing methods for base quality recalibration, they rely either on the knowledge of variant sites in the genome or on sequencing depth. We thus implemented a way to learn sequencing errors without prior knowledge of variant sites, but rather based on monomorphic or pseudohaploid regions in the genome (estimateErrors). The recalibration of base quality scores is highly recommended not only - but especially when using genotype likelihoods. For more information on how we learn and apply base quality recalibration, see estimateErrors.
Downstream analysis
On a single sample basis, you can now infer heterozygosity (\(\theta\)) or - if further downstream analysis depend on it - you can also base call with ATLAS.
To assess genetic diversity within and between populations, the major and minor alleles can further be inferred from GLF files. The task majorMinor prints the genotype likelihood for major and minor alleles, respectively into one VCF file which can be used for several downstream population genetic analyses such as Inbreeding. The MajorMinor VCF can also be transformed to Beagle format to perform a PCA with ANGSD .