sequence extract problem
2
0
Entering edit mode
4.6 years ago

Dear all, I have some ID in file1, and I want to extract its' corresponding line from file 2, but the ID in these two file is not complet match, if you know there is anyway I could use a command line for that?

I got two command line here but it seems doestn't work.

grep -Fwf file1.txt file2.txt > results

awk 'NR==FNR{x[$0];next}{for(i in x)if($0~i)print}' file1.txt file2.txt

Here is ID from file 1:

TRINITY_DN100263_c0_g1_i13
TRINITY_DN100263_c1_g1_i1
TRINITY_DN100330_c0_g1_i1
TRINITY_DN100330_c0_g2_i14
TRINITY_DN100529_c0_g1_i3
TRINITY_DN100620_c0_g1_i2

Here is file 2:

TRINITY_DN132010_c5_g4  0   0   0   0   0.18    0.93    0.67    0.61    0   0.45    00.25   0   0
TRINITY_DN100263_c1_g1  0.08    0.06    0.06    0.09    0.1 0.07    0.43    0.2 0.16    0.36    0.06    0.42    0   0
TRINITY_DN50647_c0_g1   0   0   0   0.9 0   0   0   0   0   0   00
TRINITY_DN100330_c0_g2  0   0   0   0   0   0   0   0   0   0   01.06   0   0
TRINITY_DN137407_c4_g1  0   0   0.19    0   0   0   0.17    0.15    0   0.12    0.
RNA-Seq • 792 views
ADD COMMENT
0
Entering edit mode
4.6 years ago
JC 13k

You need to remove the non-matching part of the first part before doing your search, for example:

perl -pe "s/_i\d+//" < file1 > file1_mod

then you can search with grep or awk or perl.

ADD COMMENT
0
Entering edit mode
4.6 years ago
join -t $'\t' -1 1 -2 1 <(sort -t $'\t' -k1,1 file1.txt) <(sort -t $'\t' -k1,1 file2.txt) > results
ADD COMMENT

Login before adding your answer.

Traffic: 1642 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6