Extract rows present in file1 and not in file 2
3
1
Entering edit mode
7.2 years ago
biostarsb ▴ 30

I have two files with genes

File one (with 40000 genes)

Gene 1
Gene 2
Gene 3
Gene b
Gene f
Gene c
Gene r
Gene z

File two (with 39000 genes)

Gene 1
Gene 3
Gene 2
Gene b

I would like to know if there is a command line (with awk or bash) to extract that lines that exist in the one file and not in the two file

gene awk bash • 6.5k views
ADD COMMENT
6
Entering edit mode
7.2 years ago

I would like to know if there is a command line (with awk or bash) to extract that lines that exist in the one file and not in the two file

use comm : http://man7.org/linux/man-pages/man1/comm.1.html

comm -3 <(sort file1.txt)  <(sort file2.txt )
ADD COMMENT
0
Entering edit mode

i tested this but i have all genes not only those in the file1 me i need to extract only genes present in file1 and not in file2

ADD REPLY
0
Entering edit mode

only in 1st:

comm -23 <(sort file1.txt)  <(sort file2.txt )

only in 2nd

comm -13 <(sort file1.txt)  <(sort file2.txt )
ADD REPLY
0
Entering edit mode
7.2 years ago
Asaf 10k

Get all unique genes:

cat file1.tx file2.txt |sort |uniq -c |awk '$1==1'

Get genes in file 1 not in file2:

grep -w -f file2 -v file1
ADD COMMENT
1
Entering edit mode

use uniq -u instead of uniq -c |awk '$1==1'

ADD REPLY
0
Entering edit mode
7.2 years ago

What not simply use grep?

grep -f file2.txt -v file1.txt
ADD COMMENT
4
Entering edit mode
  • if 'gene2' is in file2.txt it will remove 'gene22' from file1.txt
  • in general , if file2.txt is big, you wouldn't want to put this in memory.
ADD REPLY
0
Entering edit mode

You're right! Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6