Extract rows present in file1 and not in file 2
2
0
Entering edit mode
4.8 years ago

Hello everyone,

I have 2 tab separated files and I want to compare them on the basis of chr,start,ref and alt and print only the rows of file1 that aren't present in file2 in the output file. please see below.

file1:

chr start   ref alt freq
1   11906040    C   T   2.76E-05
1   11906049    C   T   2.76E-05
1   11906068    A   G   0.147142
1   11907124    C   T   2.77E-05
1   11907125    C   G   0.000471777
1   11907703    AAGG    A   2.78E-05
1   11907717    CGGT    C   5.59E-05

File2:

chr start      end  ref   alt    freq
1       12198   12198   G       C       .
1       12237   12237   G       A       .
1       12259   12259   G       C       .
1       12266   12266   G       A       .
1       12746   12773   GGGAGTGGCGTCGCCCCTAGGGCTCTAC    -       0
1       12745   12773   TGGGAGTGGCGTCGCCCCTAGGGCTCTAC   T       0
1   11907703    AAGG    A   2.78E-05
1   11907717    CGGT    C   5.59E-05

output file:

chr start   ref alt freq
1   11906040    C   T   2.76E-05
1   11906049    C   T   2.76E-05
1   11906068    A   G   0.147142
1   11907124    C   T   2.77E-05
1   11907125    C   G   0.000471777

thanks in advance.

Regards,

intersection • 689 views
ADD COMMENT
1
Entering edit mode
4.8 years ago
AK ★ 2.2k

From your example inputs and output, it can also be:

$ grep -v -f file2.tab file1.tab
chr     start   ref     alt     freq
1       11906040        C       T       2.76E-05
1       11906049        C       T       2.76E-05
1       11906068        A       G       0.147142
1       11907124        C       T       2.77E-05
1       11907125        C       G       0.000471777

But only if the format in your file2 is correct, since the last two rows in your file2 are with 5 fields, not 6 (no end).

$ awk '{print NF}' file2.tab
6
6
6
6
6
6
6
5
5
ADD COMMENT
0
Entering edit mode
4.8 years ago
ATpoint 82k

Look at bedtools intersect and the -v option.

ADD COMMENT

Login before adding your answer.

Traffic: 2097 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6