4.7 PMDS

Filtering for ancient reads using PMDS (Post mortem damage score)

Contamination from present-day DNA is a fundamental issue when studying ancient DNA from historical or archaeological material, and quantifying the amount of contamination is essential for downstream analyses. One way of measuring contamination or filtering out reads that most likely originate from modern DNA is to use the post-mortem damage score (PMDS) introduced by Skoglund et al. (2014). With this task, you can use ATLAS to calculate the PMDS for all reads. The output is a new BAM file, where each alignment is described by a field DS:f:<PMDS>, which specifies the alignment’s PMDS score. If the DS field already exists, it will be overwritten. It is possible to filter out the reads with a low PMDS score while writing the BAM file by using the option minPMDS or maxPMDS to specify the threshold value.

A PMDS score > 0 means that the read is more likely ancient than modern. The higher the score, the higher its probability of being ancient.

4.7.1 Input

Required inputs :

--bam Input_bam_file.bam Input bam file.
--fasta Input_refrence_genome_file.fasta Reference genome.

Optional inputs :

--recal recal.txt Quality score recalibration file (see recal.
--pmd Input_PMD.txt Post-mortem damage parameters (see PMD for generating such a file).

Specific Parameters :

--minPMDS integer_value Filter out reads that have a PMDS score below the specified value. Default = -10000.
--maxPMDS integer_value Filter out reads that have a PMDS score above the specified value. Default = 10000.

Engine Parameters :

Engine parameters that are common to all tasks can be found here.

4.7.2 Output

*_PMDS.bam bam file where each alignment has a tag DS:f:<PMDS>

4.7.3 Usage Example

#! /bin/bash

# Set atlas path
atlas=$(dirname "$0")/../build/atlas

# Simulate a BAM File with exponential pmd pattern
$atlas simulate --logFile simulate.out --pmd "CT5:0.2*exp(-0.3*p)+0.07;GA3:0.2*exp(-0.25*p)+0.06"

# estimate PMD pattern
$atlas estimateErrors --bam ATLAS_simulations.bam --fasta ATLAS_simulations.fasta --logFile estimateErrors.out --out ee

# Create BAM file with PMDS scores using PMD pattern in 'ee_RGInfo.json'
$atlas PMDS --bam ATLAS_simulations.bam --fasta ATLAS_simulations.fasta --RGInfo ee_RGInfo.json --logFile PMDS.out

# filter file with PMDS score > 1
$atlas filterBAM --filterPMDS 1 --bam ATLAS_simulations_PMDS.bam --logFile filterBAM.out