Biostar Beta. Not for public use.
Question: Detecting junctions from reads spanning large indels
0
Entering edit mode

Hello,

I am mapping my reads with an aligner that allows for large indels (bbmap, other suggestions welcome).

I am than using SamJs to get reads that have large indels >1kb

java -jar ${HOME}/programs/jvarkit/dist/samjs.jar --file get_indel_sam_filter.javascript --samoutputformat SAM -o Fasteris_Copenhagen_PlaquePurified_150903_SND104_B_L007_R1_HXE-3_mapped_Indel-1kb_v2.sam Fasteris_Copenhagen_PlaquePurified_150903_SND104_B_L007_R1_HXE-3_mapped.sort.bam

Java script used, cat get_indel_sam_filter.javascript :

function accept(rec) {
if(rec.getReadUnmappedFlag()) return true; 
var c=rec.getCigar(); if(c==null) return true; 
for(var i=0;i< c.numCigarElements();++i) {
var ce=c.getCigarElement(i);
if(ce.getLength()<1000) continue;
var s=ce.getOperator().name(); 
if( s=="D") return true; 
} return false;
}accept(record)

Now I would like to get a coordinate file with the start and end location of indels. For instance this record:

HWI-D00104:276:C6U65ANXX:7:1207:20052:96123 1:N:0:TTAGGC    0   Copenhagen_M35027   113671  38  89=11532D36=    *   0   0   TTTTTAACAGTTTGAGGTTTTAGATTTTTAGTTACAGAAGTGATATCGAATATTTTATCCAAAAAGAATGAGTAATTAATTGTCTTAGAAACGTCAACAACGAGAAAAAAGATTTAATATTACAG   BBBBBFFFFBFFFFFFFFBFFFFFF/BFF<<FFFB/B/FFFFFF//BFBBBBFFFFFFFFFFBFFFFBB/FBFFFFFFB<FFBFBFF<FF/F/FBFF</B</BFFBFFF/FF/<F////FBB/B7   AM:i:38 NM:i:11532

Has an indel between 113759 and 125292.

I am not sure if there is already a tool that easily finds these. Ideally, the cigar string before and after the indel is an 100% match (like in the example record).

Thanks.

ADD COMMENTlink 2.3 years ago Adrian Pelin ♦ 2.3k • updated 2.3 years ago Pierre Lindenbaum 120k
2
Entering edit mode

using bioalcidaejdk: http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

java -jar dist/bioalcidaejdk -e  'stream().filter(R->!R.getReadUnmappedFlag()).forEach(R->{ int refpos = R.getStart(); for(CigarElement ce:R.getCigar()) { CigarOperator op = ce.getOperator(); int len = ce.getLength(); if(len>=1000 && (op.equals(CigarOperator.N) || op.equals(CigarOperator.D))) { println(R.getContig()+"\t"+(refpos-1)+"\t"+(refpos+len)); } if(op.consumesReferenceBases())  refpos+=len; } }); ' in.bam

later: fixed extra semi-colon after ' if(op.consumesReferenceBases()) ...'

ADD COMMENTlink 2.3 years ago Pierre Lindenbaum 120k
Entering edit mode
0

sweet, goes directly to coordinates. Is there any way to get an intermediary sam file to see which reads were counted for generating coordinates?

ADD REPLYlink 2.3 years ago
Adrian Pelin
♦ 2.3k
Entering edit mode
1

using samjdk (faster than samjs)

 java -jar dist/samjdk.jar -e 'if(record.getReadUnmappedFlag()) return false; int refpos = record.getStart(); for(CigarElement ce:record.getCigar()) { CigarOperator op = ce.getOperator(); int len = ce.getLength(); if(len>=1000 && (op.equals(CigarOperator.N) || op.equals(CigarOperator.D))) { return true; } if(op.consumesReferenceBases()) refpos+=len; } return false;' in.bam
ADD REPLYlink 2.3 years ago
Pierre Lindenbaum
120k
Entering edit mode
0

sorry: that was a stupid copy+paste, here is a shorter syntax:

$ java -jar dist/samjdk.jar -e 'return !record.getReadUnmappedFlag() && record.getCigar().getCigarElements().stream().anyMatch(C->C.getLength()>=1000 && (C.getOperator()==CigarOperator.N || C.getOperator()==CigarOperator.D));'  in.bam
ADD REPLYlink 2.3 years ago
Pierre Lindenbaum
120k
Entering edit mode
0

oh I see. Both still do the same thing though, correct?

ADD REPLYlink 2.3 years ago
Adrian Pelin
♦ 2.3k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0