Ways To Filter Noise From Rnaseq Data
2
1
Entering edit mode
11.4 years ago
Wayne ★ 1.0k

Hello all, I am working with both DNA exome sequencing and RNAseq data. The samples are not directly matched, but they are of the same disease, and therefor I was hoping to use the RNAseq to check for the variants detected in DNA sequencing and visa versa. The problem is the RNAseq data is so noisy that there are far too many variants ( most of which are systematics) to do this analysis. I have thus far filtered using dbSNP, depth and frequency, and filtering out things detected in a separate panel of RNAseq normal cells that I have. Does anyone have any advice on how to trim the list down further? Perhaps some papers or examples of other groups that have done something similar? Any advice would be greatly appreciated! Thanks for your time

rna-seq variant calling sequencing • 4.8k views
ADD COMMENT
0
Entering edit mode

After your filtering have you determined that the variation is in one particular type: i.e. single nucleotide variation, splice differences, etc? Perhaps you would have to establish a different algorithm for each type of variation?

ADD REPLY
0
Entering edit mode

Yes I have different algorithms for fusions, amplifications, deletions, and variants. What I need is a way to filter the single nucleotide variants specifically. Filtering beyond simply looking at the depth and quality. Using a composite normal to filter (which is a collection of RNAseq samples of "normal" (non tumor) cells that correspond in someway to the particular disease you are looking at) seems to be the way to go, but I cannot find a good source of such samples. Additionally, some have suggested that mapping with both bwa and bowtie and taking only the intersection might be beneficial, but for me this has only removed about 4%.

ADD REPLY
4
Entering edit mode

You are not using a splice-aware aligner such as TopHat, STAR, MapSplice, etc. to align your RNA-seq data to the genome? In my experience attempting to align RNA-seq reads with BWA or Bowtie will lead to many read misplacements, soft-clipping, etc. The end result can be many, many false positive SNVs. If this is happening in your case, investing more effort in achieving high quality RNA-seq BAMs may help your problem considerably...

ADD REPLY
4
Entering edit mode
11.4 years ago
JC 13k

Calling variants in RNAseq is noisy but you can improve your calling in several ways:

  1. First be sure that you library is good, remove low quality reads before mapping, also check if you need trimming, Istvan already mentioned some tools for that.
  2. Use a good mapper, Malachi pointed some tools.
  3. Call your variants with a tool which can perform realignment, such as GATK or samtools, use higher quality filters.
  4. Besides dbSNP, you can verify if the variant is known in other databases such as Kaviar.
ADD COMMENT
3
Entering edit mode
11.4 years ago

Here is a post on different RNA-seq quality filtering options RSeqQC and RNA-SeqQC - quality control software for RNA-Seq data

ADD COMMENT

Login before adding your answer.

Traffic: 3846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6