How To Extract Only Geneid From Fasta Header?
2
0
Entering edit mode
10.8 years ago
Abdul Rawoof ▴ 60

In a multifasta file the fasta header having full details as follows:

">ENSMUSG0000005892|ENSMUST00000004524351|xclkvsldjldjkfklasdfjalsjk

">ENSMUSG0000001537|ENSMUST00000017451|dfasfasdfghfhgjhktytg

">ENSMUSG00000002234237|ENSMUST000000097869|pasdfasdfsadf

I want to extract only GeneID from above like

">ENSMUSG0000005892

">ENSMUSG0000001537

">ENSMUSG00000002234237

How can I extract only GeneID using perl program..??

Thanks.....

sequence extraction fasta perl • 3.9k views
ADD COMMENT
1
Entering edit mode

did you try to search this site before asking your question ?

ADD REPLY
0
Entering edit mode

This is a first semester student's question. Try something like split or a regex or bash or whatever, but I recommend trying to come up with an idea yourself before asking. Otherwise you'll never learn anything..

ADD REPLY
1
Entering edit mode
10.8 years ago
Kenosis ★ 1.3k

Here are two options. As a script:

use strict;
use warnings;

while (<>) {
    print "$1\n" if /(>.+?)\|/;
}

Usage: perl script.pl inFile [>outFile]

The last, optional parameter directs output to a file.

As a one liner:

perl -lne 'print $1 if /(>.+?)\|/' inFile [>outFile]

Output from both on your dataset:

>ENSMUSG0000005892
>ENSMUSG0000001537
>ENSMUSG00000002234237

In both cases, the regex captures all the characters starting with ">" up to the first "|", and then the results are printed.

Hope this helps!

ADD COMMENT
0
Entering edit mode
10.8 years ago
always_learning ★ 1.1k
while(<stdin>){
if ($_ =~/>/){
@arr=split ($_, "I")
print $arr[0]
}
}

I left on you to make this code run-able!! :):)

this will work on unix also

grep ">" file.txt | cut -d "|" -f 1

But always try to learn Friend !! :)

ADD COMMENT
0
Entering edit mode

I like the Unix solution but can't see why in the Perl code, you split on upper-case "I".

ADD REPLY
0
Entering edit mode

Hence I mentioned "I left on you to make this code run-able!! :):)" In fact I don't want to give to complete solution here in this case !! so it was | not Upper case "I"

ADD REPLY

Login before adding your answer.

Traffic: 1814 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6