Extract unique barcodes and find their frequency
0
0
Entering edit mode
3.6 years ago
kspata ▴ 80

Hi All,

I have paired end 250 sequencing data for a sample. The read count is around 2 million. The data has 70-mer barcodes which are embedded in common upstream and downstream region.

I have to analyze how many unique barcodes are present in the sample and their frequency relative to total number of reads. So far, I have mapped the reads to the reference which has N's in them for the barcode region. I found some common sequences which may be the barcodes.

I then merged forward and reverse reads with minimum overlap of 50 and grepped the observed barcode sequence. Although, I am sure of this approach is correct.

Is there any better way to perform this kind of analysis?

Help would be appreciated.

Thanks in advance !!

ngs illumina mapping • 1.1k views
ADD COMMENT
0
Entering edit mode

Are the barcodes in exactly the same location in all reads (e.g. basepair 1-30)? If so you could cut those regions out (use bbduk.sh from BBMap suite) and then sort | unique -c then to get the counts.

ADD REPLY
0
Entering edit mode

Thanks for replying. I know the location of the barcodes in the reference which is from base pair 807-896. I do not know their sequence or location in the reads.

Will bbduk work in this way?

ADD REPLY
0
Entering edit mode

You are going to need to use a custom solution for this situation. A couple of options.

Sort/index your alignment files. Retrieve SAM alignment lines that are aligning to the region (+/- N bases) on interest. You could then look at the CIGAR strings of each alignment and figure out which bases you will need to excise from original reads.

Or retrieve reads that are mapping in the region by samtools view and then do a multiple-sequence alignment to identify the section you are interested in against the reduced reference.

ADD REPLY

Login before adding your answer.

Traffic: 1813 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6