Identifying Gene name
0
0
Entering edit mode
6.5 years ago

I am currently having a problem with Identifying gene names. I have the gtf, gff, and sam files. I can identify the gene because I have information about their position and I could just search seqeunce however my list is over a thousand. Is there any way I can do this without doing it one by one manually?

RNA-Seq Assembly • 1.1k views
ADD COMMENT
0
Entering edit mode

it's not clear, from which file do you need to get the gene name ? why do you cite the BAM ? why do you search the "sequence" ?

ADD REPLY
0
Entering edit mode

at some point I lost my gene names and got tagged by something else. sequences still matched though.

ADD REPLY
0
Entering edit mode

it's kamoulox

ADD REPLY
0
Entering edit mode

Your GTF/GFF files should have the sequence names (if they are properly formatted). They are text files and you should be able to less|more filename to view contents. Post a few lines here by doing head -5 gtf/gff_file

ADD REPLY
0
Entering edit mode

at some point I lost my gene names and got tagged by something else. sequences still matched though.

ADD REPLY
0
Entering edit mode

Not exactly sure what you are referring to. You should never have to modify gff/gtf files when you do any analysis.

ADD REPLY
0
Entering edit mode

use grep in linux/OS X.

$ grep "genesymbol" <input gtf,gff>

if you have gene symbols in a file

$ grep -f "gene_symbols_file" <input gtf/gff>
ADD REPLY

Login before adding your answer.

Traffic: 3152 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6