RNA-seq experimental design for SNP calling
2
0
Entering edit mode
6.7 years ago
windhavenn ▴ 20

Hello NGS fellows,

I am a newbie here and would highly appreciate your advice about one particular experimental design.

We have data from RNAseq experiment which was originally designed to assess differential expression. The details of experiment are as follows:

2 modalities of the phenotype

Each phenotype is represented by 4 samples. 1 sample = 60 individuals pooled together at the stage of RNA isolation.

Molecule – polyadenylated mRNA

Sequencing chemistry – Illumina paired-end, read length - 2*100 bp

My question is whether it is correct to use this RNAseq data to call for SNPs? I made previous search and found that most of people calling SNP from RNAseq use 40-1000 samples (= individuals). But they initially designed RNAseq experiment for further GWAS. I see that this analysis cannot be applied to my data (at least because in my case individual flies were pooled without barcoding – 60 flies per a sample). However, can I still call for SNPs and upload the list to database as a list of potential targets for GWAS with, for example, estimation of functional impact upon protein structure? Will they be “true” SNPs, or our experimental design makes even this step invalid?

I found this paper https://www.ncbi.nlm.nih.gov/pubmed/27458203 where people used 2 phenotypes each represented by 2 samples what is almost like our experiment, but still have doubts.

RNA-Seq SNP • 2.8k views
ADD COMMENT
0
Entering edit mode

A GATK variant calling best practices worked out example pipeline from this blog

ADD REPLY
0
Entering edit mode

Dear cpad0112, unfortunately GATK pipeline does not say a word about how many samples, with or without pooling etc produce reliable SNPs. If I am wrong about this and was looking the info in the wrong places, I would highly appreciate if you can provide the link with the correct information about experimental design.

ADD REPLY
2
Entering edit mode

I suggest you to consult a statistician near by:).

ADD REPLY
0
Entering edit mode

Well, that is another problem which lead me here and to the couple of other forums :-/

ADD REPLY
2
Entering edit mode
5.7 years ago

A couple of points:

1 : As others have alluded to, the GATK variant calling in RNA seq project can help you get your data in a form that it can be pushed through the haplotype caller.

2: Variant calling in RNA seq data is still a "just because you can doesn't mean you should" exercise. There are a lot of caveats to this, and it's simply not a case that you can buy one experiment and get another free (so-to-speak).

3: You've pooled your samples, so you have destroyed your subject/ individual variability.

Just to help solidify these points - Assuming that everything goes well, and you've got your called variants, the variant is a call from pooled individuals. What if 50% of your individuals are hom ref, and 50% of your individuals are hom alt in one of your pooled samples.... they'll be called het.

I guess the crux of your question is, can I take my pooled RNA seq and extract reads from my 60 individuals? - No, there's no tool that can do this, once the library is pooled, that's it.

ADD COMMENT
1
Entering edit mode

Hi Andrew,

Thank you for your answer. We have abandoned this experiment last year exactly due to the reasons highlighted in your comment.

ADD REPLY
1
Entering edit mode
6.7 years ago
Lila M ★ 1.2k

I think you first have a look to The GATK Best Practices for variant calling on RNAseq, in full detail, you can find a good workflow also that can help you to call variants. I hope this helps

ADD COMMENT
0
Entering edit mode

Dear Lila M, I have read the GATK pipelines and lots of other things before asking the question. GATK pipeline describes the sequence of methods, but not experimental design. In fact, we have already obtained the lists of SNPs, so the choice of methods is not the issue. The problem is that I found different opinions (from different papers) about which experimental design provides statistically significant, "true" SNPs, but what I did not found yet is a clear explanation of whether my design is OK for the SNP calling.

ADD REPLY
0
Entering edit mode

So if your question is about the experimental design, maybe you should ask here Good luck!

ADD REPLY
1
Entering edit mode

Thank you! Hope to find answers there..

ADD REPLY

Login before adding your answer.

Traffic: 2732 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6