Question

Variant calling from RNA-seq?

0

Entering edit mode

5.2 years ago

zizigolu ★ 4.3k

Hi,

I have a set of duplicates marked .bam files from STAR. I likely should work on variants on cancer models and cancer itself; In this paper https://www.nature.com/articles/s41467-018-05190-9 they doing what I am up to says

For quantifying the expression of SNVS, RNA-seq reads were mapped to the GRCh37_g1k genome assembly using STAR43. Following GATK’s38 best practice on ‘Calling variants in RNAseq’, readgroups were added and duplicates marked using Picard44. GATK’s SplitNCigarReads for trimming reads and assigning mapping qualities was applied. The sets of SNVs identified using Strelka on WGS data for all tumor or organoid were merged using vcftools45. Reads overlapping any SNV were counted using the ReadCountWalker in gatktools

Does this mean that they are calling variants from RNA-seq or they are relating variably expressed genes from RNA-seq to genes carrying variants from their WGS?

WGS RNA-Seq • 2.4k views

ADD COMMENT • link 5.2 years ago by zizigolu ★ 4.3k

1

Entering edit mode

They have a list of SNVs from WGS data and then used RNA-Seq data, after running through GATK best practices, to quantify the reads overlapping SNVs.

ADD REPLY • link 5.2 years ago by GouthamAtla 12k

0

Entering edit mode

Sorry, you mean they used RNA-seq data to have another list of SNV obtained by RNA-seq so they would have two lists of SNV from WGS and RNA-seq?

ADD REPLY • link 5.2 years ago by zizigolu ★ 4.3k

0

Entering edit mode

geek_y is correct (I have edited my answer). The literal interpretation is that they did the following:

called SNVs using WGS data
mapped / aligned RNA-seq reads to the reference genome, following GATK's 'best' practices in the document entitled 'Calling variants in RNAseq'
counted RNA-seq reads that aligned to regions where the WGS SNVs had been called

They did not, in fact, call the variants from the RNA-seq data. If it needs clarifying, I know the corresponding author and can ask.

ADD REPLY • link 5.2 years ago by Kevin Blighe 87k

0

Entering edit mode

Sorry Kevin, totally you meant I don't need to call SNV from RNA-seq? I have already access to duplicates marked RNA-seq .bam files from STAR.

ADD REPLY • link 5.2 years ago by zizigolu ★ 4.3k

1

Entering edit mode

Sorry Kevin, totally you meant I don't need to call SNV from RNA-seq?

F : Sequence of operations for the analysis in paper is clearly stated in @Kevin's comment (and originally referred to by @geek_y).

SNV were already obtained from WGS (and not RNAseq) data (I assume in the paper you linked). Do you have WGS data in addition? Are you trying to follow the analysis strategy from that paper?

ADD REPLY • link 5.2 years ago by GenoMax 141k

0

Entering edit mode

Thank you, I have matched RNAseq and WGS data for each patient both in .bam format

As the question is the same, I want to do the same on my own first hand data.

ADD REPLY • link 5.2 years ago by zizigolu ★ 4.3k

0

Entering edit mode

Well, you can do what you want with your data - we live in a free society. Geek_y and I are just helping you to understand what, exactly, the authors did. We do this outside of the understanding of what your own project is about.

ADD REPLY • link 5.2 years ago by Kevin Blighe 87k

0

Entering edit mode

Sorry Kevin please don't be angry I understand you are not dictating me to what I have to do at all

This is because of my bad English

By my last comment I just meant my understanding of your total understanding of the paragraph I pasted

I believe people ask their confusion in public forums and people with knowledge in those fields are free to answer or ignore them I remember when I joined biostars about 4 years ago, I used to ask very shallow questions as I used to asking yet, first @Pierre also used to challenge my questions and comments but after a while he left me with my own forever even for a comment I meant we can ignore people instead of harsh feedbacks However thank you for your time

ADD REPLY • link 5.2 years ago by zizigolu ★ 4.3k

1

Entering edit mode

Do not worry. Looking at the quoted paragraph from the manuscript, it is not 100% clear what they did. This is the fault of the journal editors and the authors. There is a small amount of doubt surrounding the final line:

Reads overlapping any SNV were counted using the ReadCountWalker in gatktools

They do not explicitly state that these reads are RNA-seq reads, however, one can infer that this is what they did.

Just to recap:

called variants over WGS samples with Strelka. These variants were then merged together, presumably into a single VCF, with VCFtools
aligned RNA-seq reads to the 1000 Genomes version of the GRCh37 / hg19 reference genome using STAR. I have a link for this genome build in #Step 3, HERE. In doing this, they followed the steps as they outline in the methods, namely: adding readgroups, marking duplicates using Picard, and trimming reads / assigning mapping qualities with SplitNCigarReads from GATK.
count RNA-seq reads over each region in which there is a variant identified from WGS. For this, they used ReadCountWalker

You are doing very well, so, do not worry.

ADD REPLY • link 5.2 years ago by Kevin Blighe 87k

1

Entering edit mode

Thank you so much for bearing with me

ADD REPLY • link 5.2 years ago by zizigolu ★ 4.3k

0

Entering edit mode

~~Well, which manuscript is it?~~ - thanks for adding the link to the manuscript. Be aware of the limitations of calling variants from RNA-seq: A: Inferring genotype based on RNA sequnces

ADD REPLY • link 5.2 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you, I got confused here because if they have called SNVs from RNA-seq why they are mentioning

The sets of SNVs identified using Strelka on WGS data for all tumor or organoid were merged using vcftools45

ADD REPLY • link 5.2 years ago by zizigolu ★ 4.3k

0

Entering edit mode

Please edit your post to use a more useful/informative title. It doesn't help future users identify, at a glance, relevant content.

ADD REPLY • link 5.2 years ago by Joe 21k