Print First Occurence Of A Line
3
1
Entering edit mode
11.0 years ago

I extracted ORFs from a initial fasta file and now I want to get the longest ORF for each transcript.

After having extracted the size of the ORFs with faSize and sorted them by size, the code I was used to use is:

perl -ane'print unless $x{$F[0]}++'

This time I have a problem using the perl command.

After having extracted the size and sorted the transcripts I have something like this:

Singlet_1000_61 3844

Singlet_2000_73 3508

Singlet_1000_62 3081

Singlet_2000_62 3008

Singlet_3500_48 2973

Singlet_4000_48 2964

Singlet_3500_54 2863

What I want is:

Singlet_1000_61 3844

Singlet_2000_73 3508

Singlet_3500_48 2973

...

The perl command is not working in this case.

Do you have any suggestions on how I can make it work?

Or a awk command?

Thanks for help

perl awk bash fasta • 2.1k views
ADD COMMENT
2
Entering edit mode
11.0 years ago

More efficient way using awk:

$ cat input 
Singlet_1000_61 3844
Singlet_2000_73 3508
Singlet_1000_62 3081
Singlet_2000_62 3008
Singlet_3500_48 2973
Singlet_4000_48 2964
Singlet_3500_54 2863

$ awk -F"_" '{if(!tab[$2]){print $0; tab[$2]=1;}}' input 
Singlet_1000_61 3844
Singlet_2000_73 3508
Singlet_3500_48 2973
Singlet_4000_48 2964
ADD COMMENT
0
Entering edit mode
11.0 years ago
csiu ▴ 60

This might not be the most efficient way, but:

$ cat input.txt
Singlet_1000_61 3844
Singlet_2000_73 3508
Singlet_1000_62 3081
Singlet_2000_62 3008
Singlet_3500_48 2973
Singlet_4000_48 2964
Singlet_3500_54 2863

$ cat input.txt | awk -F "_" '{print $1"_"$2"\t" $0}' | sort -u -k1,1 | awk -F "\t" '{print $2}'
Singlet_1000_61 3844
Singlet_2000_73 3508
Singlet_3500_48 2973
Singlet_4000_48 2964
ADD COMMENT
0
Entering edit mode
11.0 years ago
Kenosis ★ 1.3k

Here's another option:

perl -ne 'print if /_(\d+)_/ and !$x{$1}++' inFile

Output on your dataset:

Singlet_1000_61 3844
Singlet_2000_73 3508
Singlet_3500_48 2973
Singlet_4000_48 2964
ADD COMMENT

Login before adding your answer.

Traffic: 2906 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6