6.4 calculateF2
Calculate F2 between different samples, and within and between populations
calculateF2
estimates F2 based on the number of different sites and the total number of compared sites for each pairwise comparison from a multi-sample VCF. It can calculate F2 between different samples, and within and between populations. A VCF file used as input for calculateF2
can be created with the ATLAS task majorMinor or call.
6.4.1 Input
Required inputs :
--vcf Input_VCF.bam |
Input VCF file (see majorMinor or call for generating such a file). |
Optional inputs :
--samples samples_Populations.txt |
Text file containing the samples to be used and their population affiliation. Different values will be estimated for different populations. If no populations are provided, all samples are considered to come from the same population. |
Example samples_Populations.txt file:
sample1 1
sample2 1
sample5 2
sample8 2
Specific Parameters :
--limitLines integer_value |
To limit amount of lines to be read from VCF file. Default = Will parse entire VCF. |
--regions \*.bed |
To limit analysis to regions defined in BED file. Default = Will parse entire VCF. |
--filterDepth integer_value,integer_value |
To keep only the samples with indicated sample depth (inclusive). Default = Will keep all sites regardless of depth. |
--maxMissing numeric_value |
To filter out sites which has more than the indicated data fraction missing. numeric_value must be between 0 and 1 (inclusive). Default = keep sites regardless of missingness. |
--minMAF numeric_value |
To keep only sites for which minor allele frequency is at the least the indicated number. Default = all sites are kept regardless of minor allele frequency. |
--minVarQual numeric_value |
To only store sites with minimum variant quality as indicated or more. Default = Will keep sites regardless of their variant quality. |
--chr or --limitChr |
To keep only specified chromosomes. Default = Will keep all chromosomes. |
Engine parameters that are common to all tasks can be found here.
6.4.2 Output
*_counts.txt | A n*n matrix containing the counts of different sites in the upper triangle and the total number of compared sites in the lower triangle for all possible pair of samples. |
*_sampleF2.txt | A n*n matrix containing the pairwise sample F2 (#diff Sites/#compared Sites) for all possible pair of samples. |
*_popF2.txt | A p*p matrix containing the average F2 within and between populations for all possible pairs. |