Running htseq-count to "grab" long non coding gene_id names
0
0
Entering edit mode
2.7 years ago
dimitrischat ▴ 210

hi all,

new to bioinformatics. so bare with me.. I am trying find long non coding RNA from RNA-seq data. As i checked the human gtf file there are 2 different types of long non coding RNA, "lnc_RNA" and "lncRNA", like so:

NC_000001.11    Gnomon  transcript  29926   31295   .   +   .   gene_id "MIR1302-2HG"; transcript_id "XR_001737835.1"; db_xref "GeneID:107985730"; gbkey "ncRNA"; gene "MIR1302-2HG"; model_evidence "Supporting evidence includes similarity to: 100% coverage of the annotated genomic feature by RNAseq alignments, including 8 samples with support for all annotated introns"; product "MIR1302-2 host gene, transcript variant X2"; transcript_biotype "lnc_RNA"; 

NC_000001.11    BestRefSeq  gene    34611   36081   .   -   .   gene_id "FAM138A"; transcript_id ""; db_xref "GeneID:645520"; db_xref "HGNC:HGNC:32334"; description "family with sequence similarity 138 member A"; gbkey "Gene"; gene "FAM138A"; gene_biotype "lncRNA"; gene_synonym "F379"; gene_synonym "FAM138F";

"lnc_RNA" is on the "transcript" line, and "lncRNA" is on the "gene" line. My first question is should I choose "lncRNA" ?

And most importantly, how do i get only the "gene_id" names of the ones that have "lncRNA" ?

edit: for the 2nd question i did: grep 'lncRNA' GRCh38.p13_genomic.gtf > GRCh38.p13_genomic_lnc.gtf and proceeded as usual.

But is my choice correct of the lncRNA?

lncRNA htseq • 624 views
ADD COMMENT
0
Entering edit mode

In the example you posted above one is a gene_biotype and other transcript_biotype. Biotypes should be applicable to both Gene/Transcripts. I am not sure why there is an extra _ in your example for transcript. Is that convention followed for all transcripts? If you are doing analysis at the gene level then you should only select those entries.

ADD REPLY

Login before adding your answer.

Traffic: 1243 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6