Question

How to determine the total average insert size from differently divided from one fastq data pool to each reference?

0

Entering edit mode

5.6 years ago

bioinfo ▴ 10

Hi, I have difficulty in calculation for average insert size and standard deviation of RNA-seq NGS data to submit GEO database.

I generated two fastq files of paired end read(Read1, Read2) using illumina NGS sequencer.

And, I used bacteria genome as reference which has 3 chromosomes. For analysis, I used bowtie2 pipeline to align paired end data to each chromosomes. To get average insert size, such as 'samtools stats' is good using alignment.

But I think using generated each sam file ,which is divided from fastq data, for calculation is not accurate. Is there any way to get the average insert size and standard deviation from total fastq?

Thank you in advance

RNA-Seq • 3.4k views

ADD COMMENT • link updated 12 months ago by Ram 43k • written 5.6 years ago by bioinfo ▴ 10

score 0 · Answer 1 · 2018-08-29

0

Entering edit mode

5.6 years ago

GenoMax 141k

Use instructions in this post using BBMap suite: C: Target fragment size versus final insert size

There are two methods to do this. Either by alignment to a reference or by merging (if R1/R2 reads show sequence overlap).

ADD COMMENT • link 5.6 years ago by GenoMax 141k

0

Entering edit mode

Oh, thanks a lot. But my fastq files are not overlap between reads because read length is shorter than inner distance. So in my case, what is the best option for bbmap? Is it right to calculate after merging the three chromosomses fasta file?

ADD REPLY • link 5.6 years ago by bioinfo ▴ 10

0

Entering edit mode

You should ideally do it using the original fastq data if you have it. But if not, fasta should work (I think).

ADD REPLY • link 5.6 years ago by GenoMax 141k