How to determine the total average insert size from differently divided from one fastq data pool to each reference?
1
0
Entering edit mode
5.6 years ago
bioinfo ▴ 10

Hi, I have difficulty in calculation for average insert size and standard deviation of RNA-seq NGS data to submit GEO database.

I generated two fastq files of paired end read(Read1, Read2) using illumina NGS sequencer.

And, I used bacteria genome as reference which has 3 chromosomes. For analysis, I used bowtie2 pipeline to align paired end data to each chromosomes. To get average insert size, such as 'samtools stats' is good using alignment.

But I think using generated each sam file ,which is divided from fastq data, for calculation is not accurate. Is there any way to get the average insert size and standard deviation from total fastq?

Thank you in advance

RNA-Seq • 3.4k views
ADD COMMENT
0
Entering edit mode
5.6 years ago
GenoMax 141k

Use instructions in this post using BBMap suite: C: Target fragment size versus final insert size

There are two methods to do this. Either by alignment to a reference or by merging (if R1/R2 reads show sequence overlap).

ADD COMMENT
0
Entering edit mode

Oh, thanks a lot. But my fastq files are not overlap between reads because read length is shorter than inner distance. So in my case, what is the best option for bbmap? Is it right to calculate after merging the three chromosomses fasta file?

ADD REPLY
0
Entering edit mode

You should ideally do it using the original fastq data if you have it. But if not, fasta should work (I think).

ADD REPLY

Login before adding your answer.

Traffic: 1465 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6