awk command to extract deletions from a CIGAR string
2
0
Entering edit mode
5.1 years ago

Hi Biostars, I am reanalyzing some RNA-seq data and would like to identify reads with the longest deletions. I have extracted the CIGAR strings containing deletions from column 6 of my SAM files. However, I have a huge list of reads and I don't want to count the number of deletions by eyeballing through this huge list. I know I need an awk command to filter things here, but for now I can't put together a complex one for this. I would be grateful for any assistance.

samtools view sample.sam | cut -f 6 | grep "D" | head -3
12M1D89M
12M1D89M
13M1D88M
RNA-Seq alignment • 2.8k views
ADD COMMENT
0
Entering edit mode
5.1 years ago

You could do that with grep like so

echo 12M13D89M35D45M2I34M | grep -oP "(\d+)D" | tr -d "D"

will print

13
35
ADD COMMENT
0
Entering edit mode

Thank you very much. That was really quick and timely.

ADD REPLY
0
Entering edit mode
5.1 years ago

using bioalcidaejdk: http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

$ java -jar  dist/bioalcidaejdk.jar  -e 'stream().filter(R->!R.getReadUnmappedFlag() && R.getCigar()!=null).filter(R->R.getCigar().getCigarElements().stream().anyMatch(C->C.getOperator()==CigarOperator.I || C.getOperator()==CigarOperator.N)).forEach(R->println(R.getCigar().getCigarElements().stream().filter(C->C.getOperator()==CigarOperator.I || C.getOperator()==CigarOperator.N).mapToInt(C->C.getLength()).max().orElse(0) +"\t"+ R.getSAMString()));'  toy.bam |\
sort -t $'\t' -k1,1n | tail | cut -f 2- 


r002    0   ref 9   30  1S2I6M1P1I1P1I4M2I  *   0   0   AAAAGATAAGGGATAAA   *   RG:Z:gid1
r001    163 ref 7   30  8M4I4M1D3M  =   37  39  TTAGATAAAGAGGATACTG *   RG:Z:gid1   XX:B:S,12561,2,20,112
x3  0   ref2    6   30  9M4I13M *   0   0   TTATAAAACAAATAATTAAGTCTACA  ??????????????????????????  RG:Z:gid1
r004    0   ref 16  30  6M14N1I5M   *   0   0   ATAGCTCTCAGC    *   RG:Z:gid1
ADD COMMENT
0
Entering edit mode

Thank you very much for this quick and timely response.

ADD REPLY

Login before adding your answer.

Traffic: 1874 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6