Question: Genome Annotation
1
Entering edit mode

Hi all,

I am completely new to sequencing. I am a computer science student but I am working on a bioinformatics project on whole genome functional annotation.

My data is in csfasta format. How do I change this to fasta format? I am also very confused..what is the difference between the F3.csfasta file and the F5.csfasta file?

Additionally, I have been told that the data is in clc format..what does this mean?

How do I go about doing a whole genome annotation? Does anyone know of any good tools to do whole genome functional annotations?

I am extremely desperate and very very confused. Any information would be very much appreciated.

Thank you.

ADD COMMENTlink 8.4 years ago Charsonic_Wu • 10 • updated 8.4 years ago Barry • 40
3
Entering edit mode

I can't help with the cfasta conversion, but I can with the annotation portion. There are basically two types of annotation that you might be referring to de novo or variant annotation. I'll try and describe both.

If this is a newly sequenced organism and you are doing de novo annotation (i.e no existing reference genome), you can use MAKER for structural annotation as well as MAKER and InterProScan for functional annotation. Also look at gmod.org for other annotation tools from the generic model organism database project.

If this is a human genome (or an organism with an existing reference genome), and you want to annotate functional variants, use BWA to align to the reference, GATK or samtools to identify and variants (SNPs and indels). Then use VAAST or annonovar to classify and prioritize the variants.

ADD COMMENTlink 8.3 years ago Carson • 30
Entering edit mode
1

FYI: There are two workshops on MAKER in the next month or so:

Sept 28-30, Genome Annotation course at UC Davis http://gmod.org/wiki/News/UC_Davis_Courses_this_September

Oct 14 at OICR in Toronto: http://gmod.org/wiki/October_2011_GMOD_Meeting#Scheduled_Satellite_Meetings

ADD REPLYlink 8.3 years ago
Dave Clements
• 610
Entering edit mode
0

+1 for MAKER - makes life easy!

ADD REPLYlink 8.3 years ago
Yannick Wurm
♦ 2.3k
3
Entering edit mode

Also, to follow up on Carson's reply if this is ABI data for a novel genome and you're hoping to annotate the genome, you'll need to assemble it some how first. There are plenty of tools out there for this sort of task, and which one you choose will depend on a number of factors. Google will lead you to plenty of discussion - I'd have a look at Abyss (http://www.bcgsc.ca/platform/bioinfo/software/abyss) and then read a few threads like this (http://seqanswers.com/forums/archive/index.php/t-1424.html) to get a flavor for some of the issues involved. Coming from CS you'll feel right at home with all the technical details of the De Bruijn and Euler graphs involved in these tools - it's fun stuff!

ADD COMMENTlink 8.3 years ago Barry • 40
2
Entering edit mode

Hey,

if you have data, where the filename is like "_F3.csfasta" there should be a corresponding "_F3.qual" file. Both files together are your reads, coming out the sequencer. Now, depending on which sequencing plattform has been used, you have "create/apply" your "pipeline". In the case that you are working on a whole genome project, the data should be whole genome seq.. The infix F3.xxx is meaning that these are single end reads, paired end would be R3.xxx.

First of all you should search for a pipeline, with the attributes of single end reads, your seq plattform and whole genome seq. You will find some ;)

So the steps would be:

  • Map your data to a reference (search for "hg18" or "hg19", human genome - 19 is newer) using maybe BWA
  • Call your SNPs, GATK or samtools
  • Annotate your SNPs, this is, also like the mapping, a science by itself. ATM I am using NGS-SNP.

These are the real basic steps.

ADD COMMENTlink 8.4 years ago Mdeng • 520
Entering edit mode
1

The official names of the human reference genome assemblies are NCBI36 and GRCh37, respectively (NCBI36 = hg18, GRCh37 = hg19).

ADD REPLYlink 8.4 years ago
Bert Overduin
♦ 3.6k
Entering edit mode
0

Thank you very very much. That makes things clearer^^

ADD REPLYlink 8.4 years ago
Charsonic_Wu
• 10
1
Entering edit mode

I've not had to mess around with colour space data before, but I'm pretty sure that the the instrument manufacturers ABI share software to do that sort of conversion. The software is Corona-lite which can be downloaded from here.

You'll need to register with ABI, but I think it's free.

ADD COMMENTlink 8.4 years ago Rob Syme • 540
Entering edit mode
0

That's what they recommend over at SeqAnswers too.

ADD REPLYlink 8.4 years ago
Neilfws
48k
Entering edit mode
0

Thank you very very much~

ADD REPLYlink 8.4 years ago
Charsonic_Wu
• 10

Login before adding your answer.

Powered by the version 1.8