6.6 inbreeding

Estimating the inbreeding coefficient {#inbreeding}

inbreeding infers the global inbreeding coefficient 𝐹 from a population sample using Reversible-Jump Markov Chain Monte Carlo (RJ-MCMC). It infers the following model parameters :

  • 𝐹 : the global inbreeding coefficient.
  • withInbreeding: whether there is inbreeding.
  • 𝑝: the allele frequencies (per locus).
  • isPolymorph: for each locus 𝑙, whether it is polymorphic. If a locus is polymorphic, then

\[{p_{l} \sim Beta(𝛾)}, else \\ {p_{l} = 0}\]

  • log(𝛾): the logarithm of the parameter of the symmetric Beta distribution on polymorphic loci.
  • πœ‹: the fraction of polymorphic loci.

6.6.1 Input

Required inputs :

--vcf \*_majorMinor.vcf.gz a multi-sample VCF file (e.g.Β created by majorMinor). Must contain GL or PL fields.

Optional inputs :

--samples samples.txt a .txt file with a subset of samples from the VCF file (one sample per line). Note : if your VCF file contains multiple populations, and you would like to estimate the inbreeding coefficient per population, you will need to create one sample file per population (containing only the relevant samples) and then run inbreeding with each of these separately.

Specific Parameters :

--limitLines integer_value To limit amount of lines to be read from VCF file. Default = Will parse entire VCF.
--regions \*.bed To limit analysis to regions defined in BED file. Default = Will parse entire VCF.
--filterDepth integer_value,integer_value To keep only the samples with indicated sample depth (inclusive). Default = Will keep all sites regardless of depth.
--maxMissing numeric_value To filter out sites which has more than the indicated data fraction missing. numeric_value must be between 0 and 1 (inclusive). Default = keep sites regardless of missingness.
--minMAF numeric_value To keep only sites for which minor allele frequency is at the least the indicated number. Default = all sites are kept regardless of minor allele frequency.
--minVarQual numeric_value To only store sites with minimum variant quality as indicated or more. Default = Will keep sites regardless of their variant quality.
--chr or --limitChr To keep only specified chromosomes. Default = Will keep all chromosomes.
--iterations integer_value To specify the number of MCMC iterations. Default = 10000.
--burnin integer_value To specify the number of MCMC iterations in each burnin round. Default = 1000.
--numBurnin integer_value To specify the number of burnin rounds. Default = 10.
--thinning integer_value To specify to write every nth iteration (excluding burnin) to the result file. Default = 10.
--stateFile file_name To write the state of the MCMC to file. Default = disabled.
--initVals file_name_from_previous_stateFile To initialize values and jump sizes from the state of a previous MCMC run as written by --stateFile. Can be used to re-start an MCMC. Default = disabled.
--writeBurnin .Default = disabled.
--probMovingToModelNoF numeric_value To propose move to model without F with indicated probability. numeric_value must be between 0 and 1 (inclusive). Default = Will propose move to model without F with probability 0.1.
--probMovingToModelP0 numeric_value To propose move to monomorphic model with indicated probability. numeric_value must be between 0 and 1 (inclusive). Default = Will propose move to monomorphic model with probability 0.1.
--lambdaF numeric_value To set πœ† of exponential distribution for the proposal of new F when jumping to F-Model. Default = 100.
--lambdaP numeric_value To set πœ† of exponential distribution for the proposal of new 𝑝 when jumping to polymorphic model. Default = 100.

Engine parameters that are common to all tasks can be found here.

6.6.2 Output

*F_meanVar.txt .txt the posterior mean (first row) and the posterior variance (second row) of 𝐹 and withInbreeding. The parameter withInbreeding reflects the posterior probability that the population in question is inbred. For example, if the posterior mean of withInbreeding = 1.0, we are 100% sure that there is inbreeding.
*F_trace.txt .txt file with the thinned MCMC chain (trace). Contains the MCMC samples of 𝐹 and withInbreeding.
*p_meanVar.txt .txt file with the posterior mean (first row) and the posterior
variance (second row) of πœ‹, isPolymorph, log(𝛾) and 𝑝.
*p_trace.txt .txt file with all parameters associated with the allele frequency.
Contains the MCMC samples of πœ‹, isPolymorph, log(𝛾) and 𝑝.
_config.txt .txt file with configurations of the run.

6.6.3 Usage Example

#! /bin/bash

. $(dirname $0)/find_atlas
. $(dirname $0)/simulate_vcf --sampleSize 47 --chrLength 1212 --fixedSeed 129

out="inbreeding"
$atlas inbreeding --numBurnin 1 --iterations 97 \
       --vcf simulate.vcf.gz --numThreads 1 \
       --fixedSeed 1 --out $out --logFile $out.out 2> $out.eout