12.4 Perses
In this part of the workflow, your final BAMfile for downstream analysis is produced. It will merge paired-end reads, and access recal and post-mortem damage parameters. If you have paired-end reads, this module is essential for downstream analysis.
### Input
sample-file
If you want to use all samples from a previous Gaia or Rhea run, you can define sampleFile: fromGaia or fromRhea. The pipeline will automatically take the produced output-tables as inputfile.
If you want to change the automatic tables to define readgroup-merging, we advise to copy it to another place (like supporting_files/samples_Perses.tsv) so your changes are not overwritten in case you decide to re-run the Gaia or Rhea pipeline.
If you want to prepare the table by hand (e.g. because you have already aligned and locally realigned BAMfiles at hand), prepare a tab delimited table with the columns as indicated below.
Example :
We want to analyze the following files :
/path/to/s1/sample1.bam
/path/to/s2/sample2.bam
/path/to/s3/sample3.bam
/path/to/s4/sample4.bam
/path/to/s5/sample5.bam
Each of our files contains four read groups. To increase power, we want to pool certain readgroups for post-mortem damage and/or recal analysis. Readgroups separated by comma will be merged. Readgroups not mentioned will be analyzed without pooling. The letter “F” indicates, that no readgroups need to be merged. If multiple sets of readgroups should be merged, the groups are separated with a ‘;’ (e.g. for sample2 we will merge rg1 and rg2, as well as rg3 and rg4). In our example, the input file could look similar to this:
Sample | Path | poolRecal | poolPMD |
---|---|---|---|
sample1 | /path/to/s1/ | rg1,rg2 | F |
sample2 | /path/to/s2/ | rg1,rg2;rg3,rg4 | rg1,rg2 |
sample3 | /path/to/s3/ | F | F |
sample4 | /path/to/s4/ | rg1,rg2,rg3 | rg1,rg2,rg3,rg4 |
sample5 | /path/to/s5/ | F | F |
The order of the table columns can be changed. Additional columns can be present. The table can contain comments (starting with ‘#’) but no whitespace.
config-file
The config file has to be provided in yaml format. The same example as below can also be found in examples/example_config_Perses.yaml. To use it as a template, make sure to copy it to a new location, otherwise it will be overwritten once you update the pipeline.
Compulsatory fields are:
- runScript: Perses
- sampleFile: give location to sample file or specify fromGaia or fromRhea (see also Sample file)
- atlas: location of atlas executable
- ref: location of reference fasta file
{bash, file="https://bitbucket.org/wegmannlab/atlas-pipeline/raw/develop/example_files/example.config_Perses.yaml",eval=F}
output
The Perses pipeline will produce the following final output-files in the folder results/3.Perses/:
folder/files | explanation |
---|---|
mergeOverlappingReads/ | BAMfiles after merging paired-end read pairs and applying ATLAS filters. Please use these BAMfiles for subsequent analysis. |
BAMDiagnostics_inputFiles/ | If you didn’t use fromGaia as sample input, this folder contains the BAMfile information of your input files. |
BAMDiagnostics_afterReadsMerge/ | This folder contains all BAMfile information of your final BAMfiles from the mergeOverlappingReads folder. |
estimateErrors/ | If you enabled error estimation, this is where the PMD and recal results can be found. The Pallas tasks will automatically find these locations and apply PMD and recal if enabled to subsequent steps. |
recalBAM/ | If you enabled recalBAM, then this folder will contain BAMfiles with recalibrated base quality scores. You can use it for downstream analysis that is not performed with ATLAS |
outfiles/Perses_outTable.tsv | the default input-file for the Pallas module. To use it, specify fromPerses in the Pallas config file |