Dear Friends,
I have the following data below and I want to manually curate the data using a viewer; could you please help me understand on how can I visualize these data and manually curate the annotation on IGV or tablet or any other tool. I haven't used IGV before. I would really appreciate your input.
Data:
--> annotated gff file from prokka of the assembled contig
--> gff file of the 20 best blast hits (of the assembled contig)
The GFF file looks like this:
KC139526.1 BLASTN hsp 13946 15668 0.0 - 0 Match CBphage_assembly-spades-25000-readsae:NODE_1_length_88156_cov_48.969578
KC139526.1 BLASTN hsp 34229 36058 0.0 - 0 Match CBphage_assembly-spades-25000-readsae:NODE_1_length_88156_cov_48.969578
KC139526.1 BLASTN hsp 36062 37169 0.0 - 0 Match CBphage_assembly-spades-25000-readsae:NODE_1_length_88156_cov_48.969578
KC139526.1 BLASTN hsp 13190 13696 0.0 - 0 Match CBphage_assembly-spades-25000-readsae:NODE_1_length_88156_cov_48.969578
KC139526.1 BLASTN hsp 37315 37446 7e-50 - 0 Match CBphage_assembly-spades-25000-readsae:NODE_1_length_88156_cov_48.969578
--> assembled contig fasta file
--> SAM file of the reads mapped onto the assembled contig
--> protein domain hits from interproscan for the annotated genes (for functional annotation) in GFF format
What I am looking for is to visualize above data in below order (if possible) like this on a viewer:
assembled contig prokka annotated genes interproscan annotation of genes 20 blast hits reads
The order can change; this is just a picture of what I am looking to visualize. Thanks!
Thank you.. I will try IGV too, but for now I am using genomeview as Lieven susggested for annotation editing. However, when loading the data on Genomeview, I see an issue. When loading the file
The info in the file looks like this: (obtained from program "interproscan" https://github.com/ebi-pf-team/interproscan/wiki/HowToRun)
The input for interproscan is my assembled contig of size 88315.
After running When I load this interproscan GFF file on genomeview viewer I see an issue. From the example above, the ORF (6115 7533 ) annotated by interproscan aligns well with the reference contig on the viewer because the sequence number matches; however, the proteins annotated don't align to the reference contig on the viewer because the numbering of the protein residues in this case for example is 64 to 158, so it aligns to 64 to 158 of the reference contig instead of aligning within the region 6115 to 7533. Please see in the attachment (https://ibb.co/XZJKZNx), you will see that all the proteins annotated by interproscan are accumulated in the beginning of the reference contig in the viewer. Can you please let me know how can I align the respective protein to its respective position on the reference contig? I would be very thankful for your response and solution. Please let me know if something is not clear.
Thanks
Can you remove this post and add it in again as a comment on for instance my answer as this is not really an answer to the original question.
this way we try to keep the threads logically organised. thx
and yes you're observation is correct. The viewer (all of them btw) will always use the genomic sequence as reference and thus not the protein to which the interproscan domains are mapped.
In general I do not see any advantage to load interproscan result in a genome browser (when the goal is to structurally annotate genes). This is only usable to add potential functions to genes, which is not something you should do via a genome browser/editor
Thanks. I see, I understand the point. From the data I have could you please tell me how can I manually correct the annotation; I mean what logic should I be using to correct the annotations obtained from prokka and to provide functional annotation. Thanks.