Removing or not removing the duplicates in .bam file
2
0
Entering edit mode
5.1 years ago
zizigolu ★ 4.3k

Hi,

Sorry I have a list of .bam files from WGS, maintainer says that the duplicates been marked but not removed, I tried picard for removing duplicated but I am getting error

Broadinstitute says You have to be around for a little while longer before you can post links. so I can not post my question there

[fi1d18@cyan02 fi1d18]$ picard MarkDuplicates I=/temp/hgig/fi1d18/1631_WTSI-OESO_005_a_DNA/mapped_sample/HUMAN_1000Genomes_hs37d5_genomic_WTSI-OESO_005_a_DNA.dupmarked.bam O=/temp/hgig/fi1d18/1631_WTSI-OESO_005_a_DNA/mapped_sample/HUMAN_1000Genomes_hs37d5_genomic_WTSI-OESO_005_a_DNA.dupmarked1.bam M= marked-dup-metrics.txt [Thu Mar 07 17:33:42 GMT 2019] picard.sam.markduplicates.MarkDuplicates INPUT=[/temp/hgig/fi1d18/1631_WTSI-OESO_005_a_DNA/mapped_sample/HUMAN_1000Genomes_hs37d5_genomic_WTSI-OESO_005_a_DNA.dupmarked.bam] OUTPUT=/temp/hgig/fi1d18/1631_WTSI-OESO_005_a_DNA/mapped_sample/HUMAN_1000Genomes_hs37d5_genomic_WTSI-OESO_005_a_DNA.dupmarked1.bam METRICS_FILE=marked-dup-metrics.txt MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture="" of="" last="" three="" ':'="" separated="" fields="" as="" numeric="" values=""> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Thu Mar 07 17:33:42 GMT 2019] Executing as fi1d18@cyan02 on Linux 2.6.32-754.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_51-b16; Picard version: 2.8.3-SNAPSHOT
INFO 2019-03-07 17:33:42 MarkDuplicates Start of doWork freeMemory: 2012347496; totalMemory: 2027945984; maxMemory: 3817865216
INFO 2019-03-07 17:33:42 MarkDuplicates Reading input file and constructing read end information.
INFO 2019-03-07 17:33:42 MarkDuplicates Will retain up to 14684096 data points before spilling to disk.
WARNING: BAM index file /temp/hgig/fi1d18/1631_WTSI-OESO_005_a_DNA/mapped_sample/HUMAN_1000Genomes_hs37d5_genomic_WTSI-OESO_005_a_DNA.dupmarked.bam.bai is older than BAM /temp/hgig/fi1d18/1631_WTSI-OESO_005_a_DNA/mapped_sample/HUMAN_1000Genomes_hs37d5_genomic_WTSI-OESO_005_a_DNA.dupmarked.bam
[Thu Mar 07 17:33:42 GMT 2019] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2027945984
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 3752, Read name HX3_22030:3:2114:23155:23319, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned
at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:448)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:665)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:650)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:620)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:569)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:543)
at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:438)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:222)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
[fi1d18@cyan02 fi1d18]$

How I could know the duplicates already removed and I am trying non sense because I don't know what this error says at all

picard GATK RNA-Seq WGS • 1.8k views
ADD COMMENT
0
Entering edit mode

File /home/local/software/picard-tools/2.8.3/reference.dict not found

ADD REPLY
0
Entering edit mode

But reference.dict supposed to be my output by this command :(

ADD REPLY
0
Entering edit mode

You are using the jar from /local/software/picard-tools/2.8.3/jarlib/picard.jar... does /home/local/software/picard-tools/2.8.3/ exist?

ADD REPLY
0
Entering edit mode

Yes it does however this was an intermediate step for using GATK

ADD REPLY
0
Entering edit mode

Error message is very explicit about what is wrong.

ADD REPLY
0
Entering edit mode

Please pick a more descriptive title for your question(s)!

ADD REPLY
0
Entering edit mode

Sorry I have a list of .bam files from WGS, maintainer says that the duplicates been marked but not removed

ADD REPLY
0
Entering edit mode
5.1 years ago
zizigolu ★ 4.3k

The problem was I was using O while I must used OUTPUT :(

ADD COMMENT

Login before adding your answer.

Traffic: 1867 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6