Question

RNA-seq on data that has two runs per (two `sra` files) sample

1

Entering edit mode

4.7 years ago

Fawzi Yassine ▴ 20

Hi,

I am doing how RNA-seq analysis on data that has two runs per (two sra files) sample and each run has two fastq files (forward and reverse reads). An example sample is: https://www.ncbi.nlm.nih.gov/sra?LinkName=biosample_sra&from_uid=2999520

How to align such a sample (hisat2 syntax is appreciated)?

How to write the phenodata file (phenotype data) such a sample?

regards,

RNA-Seq alignment • 1.8k views

ADD COMMENT • link 4.7 years ago by Fawzi Yassine ▴ 20

0

Entering edit mode

These seem to be lane replicates: Confused about merging RNA-seq lanes/runs

Typically I would merge them at the fastq level but as dates differ quite a lot, I would process them separately and then check for potential batch effects e.g. by PCA. If there is no indication of that simply merge the BAM files.

ADD REPLY • link 4.7 years ago by ATpoint 82k

0

Entering edit mode

Thanks for the reply/

How to merge the BAM files?

How to write the phenodata file (phenotype data) such a sample?

regards,

ADD REPLY • link 4.7 years ago by Fawzi Yassine ▴ 20

0

Entering edit mode

samtools merge please read its manual for the correct syntax.

What are phenotype data in that case? Align them independently and if they are ok, merge the BAM files and use whatever phenotype data you'd use if having a single file.

ADD REPLY • link 4.7 years ago by ATpoint 82k

0

Entering edit mode

This seems complicated! Can I just ignore the second run of each sample.

ADD REPLY • link 4.7 years ago by Fawzi Yassine ▴ 20

0

Entering edit mode

I have no insight into your data or analysis goals so I cannot comment on this. Aligning data and making some basic quality controls is not too complicated but an essential step that should be done before every analysis. Try to work it out. If you do not quality control your data I do not see how you could confidently stand up for your analysis. Read the DESeq2 workflow at BIoconductor, it covers everything from alignment/quantification, creation of a count matrix and PCA. Some simple Pearson correlation on the data might suffice as well. Maybe simply merging data at fastq level is ok, too.

ADD REPLY • link 4.7 years ago by ATpoint 82k

0

Entering edit mode

You are right it’s better that I merge both runs of a each sample. In samtools merge can you tell me how to deal with the read group directive for this particular sample https://www.ncbi.nlm.nih.gov/sra?LinkName=biosample_sra&from_uid=2999520 Thanks,

ADD REPLY • link 4.7 years ago by Fawzi Yassine ▴ 20