Variant calling for a polyploid species
3
0
Entering edit mode
6.2 years ago
User000 ▴ 690

Dear all,

I am working with a tetraploid species. I have RNA-seq for varieties and a reference genome. I am going to do variant calling, possibly to study intra-varietal and inter-homoeologous SNPs.

  • Is it OK to do variant calling for a every variety separately is I have on average around 30K-50K reads (from read depth plotted over chromosome) ?
  • Because I am not sure what analysis I can do in case of cohort (joint) variant calling.
  • when doing variant calling how can I deal with the problem of homoeologous sequences? (A & B subgenomes ie.)? I am using -hisat2/tophat for alignment -GATK/FreeBayes for variant calling

Any help, a link to a publication etc is appreciated!

RNA-Seq SNP • 3.1k views
ADD COMMENT
4
Entering edit mode
6.2 years ago

You could start by looking at GATK best practices for variant calling from RNA-Seq :

https://github.com/gatk-workflows/gatk3-4-rnaseq-germline-snps-indels

In brief :

  • Alignment with STAR
  • SplitNCigarReads from GATK
  • BQSR from GATK
  • Haplotype caller GATK
  • Some filtering
ADD COMMENT
0
Entering edit mode

Hi Nicolas thank you! I already did all of these steps. But I have very specific questions I stated above.

ADD REPLY
0
Entering edit mode

Ok. Maybe you should edit your question by adding the analysis you already did.

ADD REPLY
0
Entering edit mode

I did state the programs I am using. It is not so difficult to find the best practice etc, however I think it is not so straightforward when dealing with the polyploid species. So I wanted to know if there are some particular parameters to use or filtering etc. Thank you anyway

ADD REPLY
0
Entering edit mode
6.2 years ago
Kritika ▴ 260

Is it OK to do variant calling for a every variety separately is I have on average around 30K-50K reads (from read depth plotted over chromosome) ? it is fine to do varinat calling but my concern is how much depth you can select to filter it out the low depth reads as you have RNA seq data. Usually it always good to do WGS for Variant calling

when doing variant calling how can I deal with the problem of homoeologous sequences? (A & B subgenomes ie.)? I am using -hisat2/tophat for alignment -GATK/FreeBayes for variant calling Use BWA for SNP calling but you have RNA seq data i would say go with Hisat and freebayes for variant calling You classify your reference into A and B sub genome for example chromosome A1-A3 and B1-B3 total chromsome 6 in total so in case any homoelogous reads in mapping on A will also may have map on B so you will have clear picture of this.

ADD COMMENT
0
Entering edit mode

Hi thanks! My chromosomes are already divided in A and B, but when I do variant calling is it nto going to result in false calls? due to the fact that it will map both on A & b? How to deal with this?

ADD REPLY
0
Entering edit mode

See this is reason i said for SNP calling use BWA or Bowtie because this tools will map the reads with high confidence and accuracy But you have RNA seq data so it is good to map with hisat because hisat and tophat will span exon and intronic region and map the reads in protein coding region.

ADD REPLY
0
Entering edit mode
6.1 years ago

Hello

You can check SNiPloid https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3791807/

This article can help you to go further in order to predict homeologus SNPs .

I hope it will help you.

ADD COMMENT

Login before adding your answer.

Traffic: 2972 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6