Mapping gene names to GO ids
2
0
Entering edit mode
6.5 years ago
Ric ▴ 430

Hello, I downloaded goa_uniprot_all.gaf 4.4.GB) as shown below:

!gaf-version: 2.1
!
!This file contains all GO annotations and gene product information for proteins in the UniProt KnowledgeBase (UniProtKB),
!IntAct protein complexes, and RNAcentral identifiers.
!
!Generated: 2017-09-25 14:48 
!GO-version: http://purl.obolibrary.org/obo/go/releases/2017-09-23/extensions/go-plus.owl
!
UniProtKB       OEL25522.1  moeA5           GO:0003824      GO_REF:0000002  IEA     InterPro:IPR015421|InterPro:IPR015422   F       MoeA5   A0A000_9ACTN|moeA5      protein taxon:35758     20170923        InterPro                
UniProtKB       XP_021321391.1  moeA5           GO:0003870      GO_REF:0000002  IEA     InterPro:IPR010961      F       MoeA5   A0A000_9ACTN|moeA5      protein taxon:35758     20170923        InterPro                    
UniProtKB       ABQ44355.1  moeA5           GO:0009058      GO_REF:0000002  IEA     InterPro:IPR004839      P       MoeA5   A0A000_9ACTN|moeA5      protein taxon:35758     20170923        InterPro                   
UniProtKB       XP_004953070.1  moeA5           GO:0030170      GO_REF:0000002  IEA     InterPro:IPR004839|InterPro:IPR010961   F       MoeA5   A0A000_9ACTN|moeA5      protein taxon:35758     20170923        InterPro                
UniProtKB       XP_004953070.1  moeA5           GO:0033014      GO_REF:0000002  IEA     InterPro:IPR010961      P       MoeA5   A0A000_9ACTN|moeA5      protein taxon:35758     20170923        InterPro

I have also a file which contain my mapped trinity contings to swissprot as shown below:

target_id       ens_gene
lcl|ScwjSwM_1   OEL25522.1
lcl|ScwjSwM_2   XP_021321391.1
lcl|ScwjSwM_3   ABQ44355.1
lcl|ScwjSwM_4   XP_004953070.1

To get the mapping the 2nd columns from both files could be used to create a file which has the following 2 columns (contig names, GO ids) e.g. lcl|ScwjSwM_4,GO:0030170|GO:0033014. What would be the best way to do it?

Thank you in advance.

R RNA-Seq go • 1.8k views
ADD COMMENT
0
Entering edit mode
6.5 years ago

To get the mapping the 2nd columns from both files could be used to create a file which has the following 2 columns (contig names, GO ids) e.g. lcl|ScwjSwM_4,GO:0030170|GO:0033014. What would be the best way to do it?

use join

join -t $'\t' -1 2 -2 2 \
  <(sort -t $'\t' -k2,2  goa_uniprot_all.gaf ) \
  <(sort -t $'\t' -k2,2 swissprot.tsv )

followed by a cut command to extract the column...

ADD COMMENT
0
Entering edit mode
6.5 years ago
e.rempel ★ 1.1k

Hi,

if you would like to use R for your analysis, you could use command merge.

ADD COMMENT

Login before adding your answer.

Traffic: 2049 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6