Biostar Beta. Not for public use.
Running STAR aligner with paired-end and single-end reads simultaneously
0
Entering edit mode
10 months ago
nanoide • 30

Hi all

So, I recently got some RNA-seq raw reads, both paired end (2 x 150 bp) and single-end (1x75 bp) I want to map them using STAR aligner. My main questions are then, how would you deal with these? Can STAR take both paired-end and single-end .fq files simultaneosuly? Or mapping separetely and then merging the bam files is also possible?

Any ideas?

Thank you for your advice

ADD COMMENTlink
1
Entering edit mode

Thank you all for your thoughts and useful comments!

ADD REPLYlink
1
Entering edit mode

You have not said if this is the same library sequenced two different ways. That will also have implications on how you do the data analysis.

ADD REPLYlink
0
Entering edit mode

Hi, thanks for your responses. This is indeed the same library sequenced two different ways. I'll check out BBmap suite, thanks

ADD REPLYlink
0
Entering edit mode

Thank you all for your responses.

Please allow to ask for a couple of clarifications: * I get STAR cannot deal with paired-end sequencing and single-end sequencing of different length at the same time. Do you know any aligner that can? * Do you know of any published paper that have done something similar to what you suggested?

Thank you very much

ADD REPLYlink
1
Entering edit mode

BBMap suite has a tool that allows bbmap.sh to be used for single and paired-end reads in the same alignment job.

$ bbwrap.sh in1=read1.fq,singletons.fq in2=read2.fq,null out=mapped.sam append
ADD REPLYlink
3
Entering edit mode
14 months ago
swbarnes2 5.7k
United States

I don't think STAR can take them both together. I'd process the two separately, and merge results at the end if it looks like the two experiments are telling you the same thing.

ADD COMMENTlink
3
Entering edit mode

And I would suggest to maybe hold on to merging the results until you look at a PCA of the data first to ensure there is not a batch effect of sequencing types etc...

ADD REPLYlink
0
Entering edit mode

Thank you both for your answers. So I guess I will map separately. Then with the bam files I will use deeptools plotPCA (maybe plotCorrelation too?) to check and then samtools merge. Does that sound good? Any thoughts?

Thanks!

ADD REPLYlink
0
Entering edit mode

Hi, please let me ask you for a clarification. When you stated 'merge the results at the end if it looks like the two experiments are telling you the same thing', did you mean merging the output from STAR (i.e. bam files) or counting also independently and then suming the counts if they are correlated, cluster together... etc Thank you!

ADD REPLYlink
1
Entering edit mode

Depends on how you are normalizing reads. If you are just generating raw counts, you could just combine the counts. Otherwise, you should probably merge the bams, recalculate the counts and renormalize.

ADD REPLYlink
0
Entering edit mode

Ok, thank you very much

ADD REPLYlink
1
Entering edit mode
10 months ago
Duarte, CA

I typically use single-end 50 bp reads for gene expression analysis.

If you are just interested in getting counts for differential expression (and FPKM/CPM for visualization), perhaps trim the longer R1 from the PE experiment to 75 bp?

To be safe, I probably would start by processing them separately and seeing how well the replicates cluster. If they really look like technical replicates, I think you could justify combined analysis with the trimmed reads in your Supplemental Materials.

ADD COMMENTlink
1
Entering edit mode
11 months ago
Ashastry • 60

Hello,

I would also recommend producing a correlation (Spearman or Pearson) distance matrix to see how well the samples correlate within their group. DESEQ2 has this option to produce heatmaps of distance matrix as well.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1