4.6 mergeRG

Merging read groups in a BAM file

mergeRG merges together read groups that are too small for data analysis. Some data analysis tools require certain amount of reads per read group in order to work properly. Base Quality Score Recalibration (BQSR), for example, requires a minimum of around 6 million reads. If some read groups are too small, mergeRG can be used to merge them.

Read groups should be merged according to library, sequencing run and sequencing lane, in that order. The more diverse the read groups are, the less it is advised to merge them, since programs like BQSR estimate parameters that are very specific to the read group. Read groups that have different post-mortem damage patterns should not be merged. For example, read groups that were treated for post-mortem damage e.g. with Uracil-DNA glycosylase should not be merged with read groups that were not.

4.6.1 Input

Required inputs :

--bam Input_bam_file.bam Input bam file.
--readGroups Input_text_file.txt Input text file with information about Read groups to be merged. All read groups that are to be merged should be in written in a single line separated by a tab.First Read group on the line specifies the name of the new, combined read group. File can contain multiple lines indicating multiple merges.

Optional inputs :

  • none

Specific Parameters :

--outQual integer_1,integer_2 to constrain the quality scores to the indicated range (inclusive) when writing alignments. Default = uses the full range of quality scores when writing alignments.
--writeBinnedQualities to write Illumina-binned quality scores. Default = writes raw quality scores.
  • See Filter parameters to apply specific filters for bases, reads and parsing window setting.

Engine parameters that are common to all tasks can be found here.

4.6.2 Output

*_mergedRG.bam BAM file with merged Read groups.
*_mergedRG.bam.bai Index file for BAM file with merged Read groups.
*_filterSummary.txt Filter summary File with information about all Read groups: merged or otherwise.

4.6.3 Usage Example

#! /bin/bash

. $(dirname $0)/find_atlas

. $(dirname $0)/simulate --numReadGroups 10 --fixedSeed 155

echo "SimReadGroup1 SimReadGroup2" > rgs.txt
echo "SimReadGroup3 SimReadGroup4 SimReadGroup5 SimReadGroup6 SimReadGroup7 SimReadGroup8 SimReadGroup9 SimReadGroup10" >> rgs.txt

out="mergeRG"
$atlas --task mergeRG --bam simulate.bam --readGroups rgs.txt \
       --fixedSeed 159 --out $out --logFile $out.out 2> $out.eout