Question

How to highlight multiple references mapped reads?

1

Entering edit mode

6.6 years ago

vmicrobio ▴ 290

Hi all,

A very basic question but I can't solve it easily. After a raw-read mapping against a list of references, how to make a list of reads that match with more than one reference (and which ones) to highlight them?

Thank you very much!

alignment • 1.3k views

ADD COMMENT • link updated 6.6 years ago by Pierre Lindenbaum 161k • written 6.6 years ago by vmicrobio ▴ 290

0

Entering edit mode

define highlight please.

ADD REPLY • link 6.6 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

I just meant to work with them after. Sorry if my question is not clear : I'm trying to get two different things:

make a file containing reads (matching with 2 references or more) and with which references associated
a count that tell me that x reads match with references 'a' and 'b', y reads with 'b', 'c', 'd',...

ADD REPLY • link 6.6 years ago by vmicrobio ▴ 290

score 1 · Answer 1 · 2017-09-25

1

Entering edit mode

6.6 years ago

Pierre Lindenbaum 161k

extract the names of the first-in-pair + mapped reads and their reference, sort+unique to get the unique (name+REF), cut to get the uniq names, sort+uniq again to the duplicated names (sames read-name but different reference):

 samtools view -f 64 -F 4 in.bam | cut -f1,3 | sort | uniq | cut -f1 | sort | uniq -d > namesF.txt

same for 2nd in pair

 samtools view -f 128 -F 4 in.bam | cut -f1,3 | sort | uniq | cut -f1 | sort | uniq -d > namesR.txt

and with which references associated

re-use those files with grep -f or ( joinif the list is large) to extract those reads

count that tell me that x reads match with references

it should be a simple sort | uniq -c

ADD COMMENT • link 6.6 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thanks a lot for your answer! What about single reads dataset?

ADD REPLY • link 6.6 years ago by vmicrobio ▴ 290

0

Entering edit mode

it's even faster.

 samtools view -F 4 in.bam | cut -f1,3 | sort | uniq | cut -f1 | sort | uniq -d > all.txt

ADD REPLY • link 6.6 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Hi Pierre,

Thank you for your answer, that's what I've tried but I wanted to be sure that I miss anything

I tried this:

samtools view -F 4 file.bam | cut -f1,3 | sort | uniq | cut -f1 | sort | uniq -d > names.txt

samtools view file.bam | grep -f names.txt > sequences.sam

cat sequences.sam | sort | uniq -c > interm

output:

1 ABDC8:10138:09707 2048    ref1    1   60  23H77M185H  *   0   0   ACATGGATGTTACGCAGCAGGGCAGTCGCCCTAAAACAAAGCACGATGCACTAAGCACATACCTGCTCACAGCCTAC   :9858///484668::::::;4;:::666;3:<<<0;;;3;=5;;;::;=///6:99977717178899::994:=1   NM:i:0  MD:Z:77 AS:i:77 XS:i:0  SA:Z:ref2,1,+,64S36M1I18M1I19M1I97M1D13M35S,60,4;

to format the ouput I did

awk '{print $2,$4,$17}' interm >final

and I got:

ABDC8:10138:09707 ref1 SA:Z:ref2,1,+,64S36M1I18M1I19M1I97M1D13M35S,60,4;

question: how to put tabs between my selected columns and to select only ref2 instead of all column 17 (I have a lot of reference names with a lot of different character sizes)

Thanks again!

ADD REPLY • link 6.6 years ago by vmicrobio ▴ 290