Alignment in IGV: extracting specific reads.
0
0
Entering edit mode
6.4 years ago
KVC_bioinfo ▴ 590

Hello All,

I have aligned the query sequence with the human genome. I am interested in looking at exons of a specific gene. When I look at the IGV I can see several reads that are mapping to the gene of interest across all the exons. I am confused how can I get the read length of each read which is aligning to exon 1, exon 2 and so on.

I tried to extract the reads giving coordinates of exons using samtools. However, when i converted the resulting bam file to fasta and tried to calculate the read length this gives me the original read length and not the read length of read matching to the region I specified.

I used the following command:

For extracting the reads from original bam file for the region of interest:

samtools view -h in.bam "chr3:region1-region2" > exon.bam

Converting bam to fasta :

samtools bam2fq exon.bam | seqtk seq -A > exon.fasta

for counting the lengths of reads from fasta file:

awk '/^>/ {if (seqlen) print seqlen;print;seqlen=0;next} {seqlen+=length($0)}END{print seqlen}' exon.fasta

Am I doing anything wrong here or missing anything totally? Thank you.

iGV samtools • 4.6k views
ADD COMMENT
1
Entering edit mode

However, when i converted the resulting bam file to fasta and tried to calculate the read length this gives me the original read length and not the read length of read matching to the region I specified.

Some reads are going to extend outside these boundaries. I am not sure what exactly you are trying to do? What does "tried to calculate the read length" mean?

ADD REPLY
0
Entering edit mode

When I tried to calculate read length of the reads which I got from ::

samtools view -h in.bam "chr3:region1-region2" > exon.bam (chr3:region1-region2 is one of the exons)

This gave me read lengths of original reads and the reads that align with the region I specified.

I am trying to get the length of reads that align with exon 1, exon 2 and so on.

ADD REPLY
1
Entering edit mode

Still not clear. Individual reads are all going to be more or less the same length (subject to trimmed lengths). Are you trying to see what is the length of soft clipping during alignments?

Note: Are these Nanopore/PacBio reads? Again you may have left that important bit out.

So you want to find the part of read that aligns to Exon 1 and then Exon 2 etc?

ADD REPLY
0
Entering edit mode

These are nanopore reads. I am trying to see what is the length of the reads that mapped to region I specified.

ADD REPLY
3
Entering edit mode

If you are only interested in length of the part that is aligned then you could look at the CIGAR's and flags/tags in your alignments.

ADD REPLY
0
Entering edit mode

Thank you. How can I identify from cigar? The cigar remains the same for the read across all the exons.

ADD REPLY
1
Entering edit mode

If you have the CIGAR fields in front of you, the read length is column 6.

ADD REPLY
0
Entering edit mode

Can you post a read that maps to multiple exons from your SAM file?

ADD REPLY
1
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2289 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6