4.4 filterBAM

Writing reads that pass filters to BAM file

filterBAM filters BAM files and produces a filtered BAM file along with index. See Filter parameters for all possible filter methods along with their respective default settings. Counts of all removed reads are printed to the terminal as well after filtering.

4.4.1 Input

Required inputs :

--bam Input_bam_file.bam Input bam file

Optional inputs :

--pmd Input_PMD.txt post-mortem damage parameters (see PMD for generating such a file)
--recal recal.txt quality score recalibration file (see recal for further information)

Specific Parameters :

--outQual integer_1,integer_2 to constrain the quality scores to the indicated range (inclusive) when writing alignments. Default = uses the full range of quality scores when writing alignments.
--writeBinnedQualities To write Illumina-binned quality scores. Default = Will write raw quality scores
--acceptedDistance integer_value To specify distance up-to which mates will not be considered orphans. Default = 2000 bp.
--keepOrphans To keep orphaned reads. Default = Will filter out orphaned reads.
--removeSoftClippedBases To remove all softclipped bases. Default = Will not remove softclipped bases.

Note: If both outQual and writeBinnedQualities are given, qualities will be truncated first, then binned, and may thus fall outside the requested range.

  • See Filter parameters to apply specific filters for bases, reads and parsing window setting.

Engine parameters that are common to all tasks can be found here.

4.4.2 Output

*.bam.bai BAM index file
*_filterSummary.txt .txt file with list of all applied filters, along with counts of removed reads.
*__filtered.bam Filtered BAM file.
*_filtered.bam.bai Index for filtered BAM file.

4.4.3 Usage Example

#! /bin/bash

. $(dirname $0)/find_atlas
. $(dirname $0)/simulate --type HW --F 0.1 --fixedSeed 99 \
  --sampleSize 19 --chrLength 1111 --fracPoly 1.0 \
  --alpha 2.0 --beta 2.0 --seqType paired --seqCycles 101

echo "chr1  0   1000" > bed.bed
echo "chr2  100 800" >> bed.bed
echo "chr3  600 760" >> bed.bed

out="filterBAM_0"
$atlas --task filterBAM --dryRun \
           --bam simulate_ind1.bam --regions bed.bed \
           --fixedSeed 91 --out $out --logFile $out.out 2> $out.eout

out="filterBAM_1"
$atlas --task filterBAM --dryRun \
       --bam simulate_ind1.bam --mask bed.bed --maskPorosity 0.1 \
       --fixedSeed 91 --out $out --logFile $out.out 2> $out.eout

for i in {2..19}; do
    samtools view simulate_ind$i.bam | head -250 | tail -10 | cut -f1 \
           > blacklist_$i.txt
    u=$(echo "$i*5" | bc)
    out="filterBAM_$i"
    $atlas --task filterBAM --bam simulate_ind$i.bam \
           --filterMQ 0,$u --blacklist blacklist_$i.txt --filterReadLength 0,$u \
           --filterFragmentLength 0,$u --filterSoftClips "0.$i" \
           --fixedSeed 1$i --out $out --logFile $out.out 2> $out.eout
done