CDS FASTA file as Reference sequence for Ion Torrent
2
0
Entering edit mode
6.6 years ago
serenabivona ▴ 10

Hi!

I'm sorry! My question will be stupid for somone else, but it is my first approach with ngs analysis...

I will use PGM and Torrent Suite Software to sequence a small region of interest. In summary, my sample is cDNA and my amplicons span only a part of a CDS of my human gene of interest, with amplicons overlapping more than one exons.

I think that I can't use hg19 as reference genome and add a file .bed containing the position of exons or amplicons on chromosome... ex: amplicon 1 overlaps exons 3,4 and 5 and according to exons position on chromosome when I designed the .bed file the amplicons is diveded in three chromosome region....

Reading the user guide of torrent suite to plun the run....I read that I could upload a new reference file in fasta format.....

So, I'm thinking that I colud upload as reference file the .fasta file of my CDS for the alignment to simply restrcit the region, without create file .bed.....

Am I CRAZY???
Does the alignment program go mad? Does it work? I hope someone can help me

Thank you all

FASTA REFERENCE CDS bed next-gen • 2.3k views
ADD COMMENT
0
Entering edit mode

i.e. Can I use my reference file "cds" .fasta as reference instead of a classic reference genome hg19 to plun the run??

Thank you...

ADD REPLY
1
Entering edit mode

Yes you can change your reference :) But in this case the will not have metric about depth coverage of you zone of interest.

ADD REPLY
0
Entering edit mode

Thank you Titus.....please, could you explain it better?

Thank you!

Sere

ADD REPLY
1
Entering edit mode
6.6 years ago

As you have cDNA, as you've implied, I don't believe that it's proper to use the genome as the reference. Reads will still align very well to it, but reads covering splice sites may not map using the standard aligners. I think that you imply that in your question.

I would download the GENCODE reference transcriptome (see section on Fasta files) and install that into TorrentSuite. Instead of chromosomes as FASTA headers, this has as headers the >199,000 cDNA transcripts that were identified by ENCODE. You could feasibly extract the sequence just for your gene of interest.

Just for further information: using the genome as a reference is common in de novo assemblers (like TopHat2 / HISAT2), which use the genomic sequence in an attempt to find novel splice-sites and novel transcripts.

Kevin

ADD COMMENT
0
Entering edit mode
6.6 years ago
serenabivona ▴ 10

Thank you all for your answer! You were very helpful to clarify my question.

So, I think that is usless or improper create a file.bed....because I read that the bed file needs for the track name chromosome, start and end position as minimal data, unless I could create a "similar bed file" with position on cds....

Thank you so much and have a nice week-end!

Serena

ADD COMMENT
0
Entering edit mode

Yes, the minimal BED format is:

chr start end

Does TorrentSuite know that you using cDNA? - I mean, are you sure that TorrentSuite has a pipeline specific for cDNA gene expression? If you process your sample as a regular genomic sequence, then TorrentSuite will search for single nucleotide variants; however, you want gene expression results, right?

If you are interested in gene expression and are using the GENCODE reference transcriptome in FASTA, then I do not believe that you require a BED file. Instead, also download the GENCODE GTF file, which can contain similar information as a BED, from the website that I mentioned in my previous post. TorrentSuite should permit you to use a GTF file in place of a BED file. If not, then there are other ways to convert the GTF to BED (search Biostars). The GTF/BED is key because it will already contain co-ordinates for different exons.

You may want to take a look at THIS study: the authors also used cDNA in TorrentSuite.

ADD REPLY
1
Entering edit mode

Hi Kevin!

Sorry :)! I forgot to explain that I'm not interested to gene expression, my purpose is study, by ngs, mutations that occur in the tyrosine kinase domain (TKD) of abl gene, that is part of the fusion gene bcr-abl. To exclude the possibility to sequence the allele not trasclocated (wild-type), all researchers start from cDNA of the fusion gene and not from DNA. After, to select only abl TKD region, all use primer pool for that region of cDNA of abl, by nested pcr.

Thank you :)

ADD REPLY
0
Entering edit mode

Very interesting. Good luck with it!

ADD REPLY

Login before adding your answer.

Traffic: 3431 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6