Get CDS from consensus genome assembly
0
0
Entering edit mode
5.6 years ago
YocelynGG ▴ 70

Hello!!

I'm trying to get the coding sequences from several reference-genome assemblies. The reference-genome assemblies were obtained wit: GATK, samtools mpileup, bcftools, vcfutils.pl and seqtk.

I can extract the CDS regions with bedtools and use the gff file from the reference genome, but I'm thinking that I could lost some regions of coding sequences if I only get the cds based on the reference genome.

I would like to find and extract those coding sequences of each consensus genome without use the genomic information of the reference genome.

I have been trying to get the CDS using: ESTScan and Transeq, but I would like to know if there is a best strategy to perform it.

Thank you so much

CDS proteins consensus_sequence • 1.3k views
ADD COMMENT
0
Entering edit mode

The reference-genome assemblies were obtained wit: GATK, samtools mpileup, bcftools, vcfutils.pl and seqtk.

This really doesn't explain what you have done. I suspect you have several resequencing genomes, by the list of tools used. And you suspect some of these genomes will have additional genes in relation to the reference annotation?

Are you extracting CDS with Transeq and ESTscan from the whole genome sequence? That is not how they should be used, they are not the appropriate tools for the task.

ADD REPLY

Login before adding your answer.

Traffic: 2089 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6