Categorising A List Of Refseq Id With Prefix Nr_ Into Mirna, Sn0Rna, And Trna
2
2
Entering edit mode
11.7 years ago
Dataminer ★ 2.8k

Hi!

I have a list of RefSeq ids with prefix NR_ something like this:

NR_002838
NR_003030
NR_003038
NR_003110
NR_003129
NR_003142
NR_003186
NR_003574
NR_003579
NR_003697
NR_003955
NR_015419
NR_015429
NR_015433
NR_015433
NR_015434
NR_015453
NR_023392
NR_024004
NR_024031
NR_024034
NR_024117
NR_024249
NR_024377
NR_024443
NR_024464
NR_024469
NR_024478
NR_024480
NR_024496
NR_026567
NR_026667
NR_026693
NR_026757
NR_026772
NR_026899
NR_026901
NR_026943
NR_026959
NR_026959
NR_027037
NR_027055
NR_027062
NR_027084
NR_027113
NR_027274
NR_027283
NR_027301
NR_027355
NR_027451
NR_027451
NR_027457
NR_027504
NR_027764
NR_027928
NR_027992
NR_028090
NR_028303
NR_028370
NR_029390
NR_029420
NR_029613
NR_029687
NR_029688
NR_029839
NR_030326
NR_030368
NR_030385
NR_030386
NR_030621
NR_030627
NR_031688
NR_033667
NR_033667
NR_033667
NR_033667
NR_033770
NR_033866
NR_033944
NR_033970
NR_034003
NR_034014
NR_034014
NR_034080
NR_034095
NR_034179
NR_036111
NR_036152
NR_036155
NR_036158
NR_036485
NR_036490
NR_037410
NR_037480
NR_037512
NR_037791
NR_037890
NR_037894
NR_037946
XM_001719398
XM_003118802
XM_003118802
XM_003403539
XR_108601
XR_132745

How can I categorise this list into following categories: miRNA, sn0RNA, and tRNA any ideas?

Thank you

annotation genomics • 2.7k views
ADD COMMENT
10
Entering edit mode
11.7 years ago

The following shell script:

cat input.txt | while read ACN
do
    curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=${ACN}&strand=1&seq_start=1&seq_stop=1&rettype=fasta&retmode=text" |
    grep ">" | awk -v A=${ACN} '{
        T="other";
        if(index($0,"small nucleolar RNA")!=0) {T="small nucleolar RNA"}
        else if(index($0,"non-coding RNA")!=0) {T="non-coding RNA"}
        else if(index($0,"microRNA")!=0) {T="microRNA"}
        else if(index($0,"mRNA")!=0) {T="mRNA"}
        else if(index($0,"miscRNA")!=0) {T="miscRNA"}
        printf("%s\t%s\n",A,T);  
        }'

done

produces the following result:

NR_002838    non-coding RNA
NR_003030    small nucleolar RNA
NR_003038    small nucleolar RNA
NR_003110    non-coding RNA
NR_003129    non-coding RNA
NR_003142    small nucleolar RNA
NR_003186    non-coding RNA
NR_003574    non-coding RNA
NR_003579    non-coding RNA
NR_003697    small nucleolar RNA
NR_003955    non-coding RNA
NR_015419    non-coding RNA
NR_015429    non-coding RNA
NR_015433    non-coding RNA
NR_015433    non-coding RNA
NR_015434    non-coding RNA
NR_015453    non-coding RNA
NR_023392    non-coding RNA
NR_024004    non-coding RNA
NR_024031    non-coding RNA
NR_024034    non-coding RNA
NR_024117    non-coding RNA
NR_024249    non-coding RNA
NR_024377    non-coding RNA
NR_024443    non-coding RNA
NR_024464    non-coding RNA
NR_024469    non-coding RNA
NR_024478    non-coding RNA
NR_024480    non-coding RNA
NR_024496    non-coding RNA
NR_026567    non-coding RNA
NR_026667    non-coding RNA
NR_026693    non-coding RNA
NR_026757    non-coding RNA
NR_026772    non-coding RNA
NR_026899    non-coding RNA
NR_026901    non-coding RNA
NR_026943    non-coding RNA
NR_026959    non-coding RNA
NR_026959    non-coding RNA
NR_027037    non-coding RNA
NR_027055    non-coding RNA
NR_027062    non-coding RNA
NR_027084    non-coding RNA
NR_027113    non-coding RNA
NR_027274    non-coding RNA
NR_027283    non-coding RNA
NR_027301    non-coding RNA
NR_027355    non-coding RNA
NR_027451    non-coding RNA
NR_027451    non-coding RNA
NR_027457    non-coding RNA
NR_027504    non-coding RNA
NR_027764    non-coding RNA
NR_027928    non-coding RNA
NR_027992    non-coding RNA
NR_028090    non-coding RNA
NR_028303    non-coding RNA
NR_028370    non-coding RNA
NR_029390    non-coding RNA
NR_029420    non-coding RNA
NR_029613    microRNA
NR_029687    microRNA
NR_029688    microRNA
NR_029839    microRNA
NR_030326    microRNA
NR_030368    microRNA
NR_030385    microRNA
NR_030386    microRNA
NR_030621    microRNA
NR_030627    microRNA
NR_031688    microRNA
NR_033667    non-coding RNA
NR_033667    non-coding RNA
NR_033667    non-coding RNA
NR_033667    non-coding RNA
NR_033770    non-coding RNA
NR_033866    non-coding RNA
NR_033944    non-coding RNA
NR_033970    non-coding RNA
NR_034003    non-coding RNA
NR_034014    non-coding RNA
NR_034014    non-coding RNA
NR_034080    non-coding RNA
NR_034095    non-coding RNA
NR_034179    non-coding RNA
NR_036111    microRNA
NR_036152    microRNA
NR_036155    microRNA
NR_036158    microRNA
NR_036485    non-coding RNA
NR_036490    non-coding RNA
NR_037410    microRNA
NR_037480    microRNA
NR_037512    microRNA
NR_037791    non-coding RNA
NR_037890    non-coding RNA
NR_037894    non-coding RNA
NR_037946    non-coding RNA
XM_001719398    mRNA
XM_003118802    mRNA
XM_003118802    mRNA
XM_003403539    mRNA
XR_108601    miscRNA
XR_132745    miscRNA
ADD COMMENT
0
Entering edit mode

@Pierre: I like your usage of curl with ncbi eutils

ADD REPLY
0
Entering edit mode

Thank You Pierre.

ADD REPLY
0
Entering edit mode
8.0 years ago
ohadg123 ▴ 30

Another way to retrieve this information is using the UCSC table browser. You need to download the table "kgXref" which link UCSC and NCBI accession numbers to "gene symbols" and "gene description"

Go to the table browser page and select:

Group : "Gene and Gene Prediction" Track: "RefSeq Genes" Table: "kgXref"

Here is an example output: (header line is marked with #)

#kgID   mRNA    spID    spDisplayID geneSymbol  refseq  protAcc description rfamAcc tRnaName

uc001aal.1  NM_001005484    Q8NH21  OR4F5_HUMAN OR4F5   NM_001005484    NP_001005484    Homo sapiens olfactory receptor, family 4, subfamily F, member 5 (OR4F5), mRNA.
ADD COMMENT

Login before adding your answer.

Traffic: 2862 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6