Picard Mark Duplicate Reads In Galaxy
3
1
Entering edit mode
10.6 years ago
Tonyzeng ▴ 310

HI, I want to use picard mark duplicate reads to just mark my read with potential duplication. from the manual of picard, we need to sort and then index BAM files before we do mark duplicate if I operate it under Linux. HOwever, I want to just try markduplication of Picard under galaxy first, then I found that there is no sort/index BAM function in Galaxy but just mark duplicate read function, Is that mean Galaxy mark duplicate under Picard toolkit has also considered sort/index BAM in it?

picard markduplicates galaxy • 6.1k views
ADD COMMENT
1
Entering edit mode
10.5 years ago
boris ▴ 10

I understand that Galaxy automatically sorts by coordinate the BAM files you upload. Also, if a Galaxy tool outputs a BAM file, the implementation will output a sorted file. So, you do not need to sort before running the MarkDuplicates tool within Galaxy. The file is already sorted.

ADD COMMENT
0
Entering edit mode
10.6 years ago

I don't know how picard Mark Duplicate is implemented in galaxy, but the command-line program indexes the Bam by default at the end.

http://picard.sourceforge.net/command-line-overview.shtml#Overview

The following options are relevant for most Picard programs:

CREATE_INDEX=Boolean
ADD COMMENT
0
Entering edit mode
10.6 years ago

I don't think it does, and in fact Picard tools expect BAM files to be at least indexed. in case your BAM files aren't sorted you can use the "Assume reads are already ordered" option as FALSE, but in case they aren't indexed I don't see in Galaxy the obvious option that is to index them through a simple "samtools index bamfile". the only tool available I see would be GATK PrintReads, that would allow you to filter any type of reads before deduping if desired like low mapping qualities or any malformed reads (we indeed use it as an initial step in our GATK pipeline to prepare our BAM files with the --read_filter MappingQualityZero --filter_mismatching_base_and_quals options), but it also will generate an index file for your BAM file. then you should be able to feed Picard's MarkDuplicates with GATK's PrintReads' output.

ADD COMMENT

Login before adding your answer.

Traffic: 2719 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6