How to Separate a SAM File Containing Forward and Reverse Reads
0
0
Entering edit mode
5.4 years ago
Ric ▴ 430

I got 0 plus and minus reads by aligning single reads against a gene in the following way:

> bwa index ref.fasta
> bwa mem ref.fasta sample5-21.fq | gzip -3 > sample5-21.sam.gz
> splitsam.sh sample5-21.sam.gz forward.sam.gz reverse.sam.gz
Total reads:    1792444
Plus reads:     0
Minus reads:    0
Unmapped reads: 1792444
Time:           1.166 seconds.

What did I miss?

gene alignment RNA-Seq • 1.2k views
ADD COMMENT
0
Entering edit mode

Hello Ric ,

it is unusual to gzip sam files. Instead you should convert it to sorted bam files using samtools sort -o output.bam input.sam.

But as splitsam.sh doesn't complain about the format, this is not your problem here.

Please convert to sorted bam and post the output of:

$ samtools flagstat output.bam

fin swimmer

ADD REPLY
0
Entering edit mode

Did you check the results of bwa mem? According to splitsam.sh, there aren't maped reads in your sam file. Try:

zcat sample5-21.sam.gz | samtools view | head

and:

zcat sample5-21.sam.gz | samtools view | tail

Indeed it is really strange to use gzipped sam files, but this is allowed by splitsam.sh (part of BBTools) - and apparently, splitsam.sh doesn't take bam as input. But I wouldn't gzip sam files, as very few tools will accept gzipped sam.

edit: take note you are also using splitsam.sh incorrectly, as it expects at least four arguments:

Usage:        splitsam <input> <plus output> <minus output> <unmapped output>

Input may be stdin or a sam file, raw or gzipped.
Outputs must be sam files, and may be gzipped.
  
ADD REPLY
0
Entering edit mode
samtools flagstat sample5-21.sorted.bam
1792444 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 mapped (0.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
ADD REPLY
0
Entering edit mode
> samtools view sample5-21.sorted.bam | head
NS500334:143:H3GTHBGX9:4:11401:23521:1002   4   *   0   0   *   *   0   0   NCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:16217:1008   4   *   0   0   *   *   0   0   NCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:9757:1012    4   *   0   0   *   *   0   0   NCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:16357:1016   4   *   0   0   *   *   0   0   TATATGTTCTCAGGTCGCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:15714:1027   4   *   0   0   *   *   0   0   TTCGGACCAGGCTTCATTCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:2589:1037    4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:8137:1041    4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:21304:1044   4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:17205:1078   4   *   0   0   *   *   0   0   GTGTACGCGTCAGCTGCTGCT   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:19354:1079   4   *   0   0   *   *   0   0   AACAGACCGGTAGACTTGAAC   *   AS:i:0  XS:i:0


> samtools view sample5-21.sorted.bam | tail
NS500334:143:H3GTHBGX9:1:23312:10665:20320  4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:7465:20341   4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:2947:20342   4   *   0   0   *   *   0   0   TCCGGGCTTTACTTGTACAGC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:22490:20344  4   *   0   0   *   *   0   0   ACCGCATCGAGCTGAAGGGCA   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:24199:20360  4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:22936:20364  4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:13141:20379  4   *   0   0   *   *   0   0   GGAGACTCGAACTCACAACCT   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:2345:20381   4   *   0   0   *   *   0   0   CACCTACGGCAAGCTGACCCT   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:17238:20382  4   *   0   0   *   *   0   0   CCCAAAGGAGAAGCTCAACTC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:15492:20404  4   *   0   0   *   *   0   0   CGGAGATTACAATGGACGATT   *   AS:i:0  XS:i:0
ADD REPLY
0
Entering edit mode

The reference is 5044 bp long and the reads are 21 bp long. Should I use a different aliger and if yes then which one?

ADD REPLY
0
Entering edit mode

Yes, bwa mem is intended to reads 70 base pairs or longer, you should use another aligner. I would recommend Bowtie (preferred) or Bowtie2 (you may have to tweak the settings).

Could you please report back if this improves alignment? Thanks.

ADD REPLY
0
Entering edit mode

I tried bowtie2 but I ran into this problem bowtie2 can't find index

ADD REPLY

Login before adding your answer.

Traffic: 2367 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6