Question: print non match to list lines of GTF file
0
Entering edit mode
3 months ago
Sam • 110

Dear Biostars

I have a GTF file and also a gene_id list file. I want to exclude the lines contain the gene_id of list file

any help?

Thanks

GTF file:

    Chr08   StringTie   exon    58908449    58908806    1000    -   .   gene_id "MSTRG.26714"; transcript_id "MSTRG.26714.1"; exon_number "1";
    Chr08   StringTie   exon    58917751    58917790    1000    -   .   gene_id "MSTRG.26718"; transcript_id "MSTRG.26718.1"; exon_number "2";
    Chr05   StringTie   exon    61586279    61586326    1000    +   .   gene_id "MSTRG.15742"; transcript_id "MSTRG.15742.1"; exon_number "1";

list file:
MSTRG.26714
MSTRG.26717
MSTRG.26704

output:

Chr08   StringTie   exon    58917751    58917790    1000    -   .   gene_id "MSTRG.26718"; transcript_id "MSTRG.26718.1"; exon_number "2";
   Chr05    StringTie   exon    61586279    61586326    1000    +   .   gene_id "MSTRG.15742"; transcript_id "MSTRG.15742.1"; exon_number "1";
ADD COMMENTlink 3 months ago Sam • 110
Entering edit mode
1

Try this:

grep -v -w -f list_file GTF_file
ADD REPLYlink 3 months ago
SMK
♦ 1.3k
3
Entering edit mode
3 months ago
Prakash ♦ 1.2k
India

Did above command worked, it didn't work for me, you can try using awk

awk -F'"' 'NR==FNR{a[$1]++;next}!a[$2]' list_file GTF_file
ADD COMMENTlink 3 months ago Prakash ♦ 1.2k
Entering edit mode
1

I used egrep instead of grep and it worked!

ADD REPLYlink 3 months ago
Sam
• 110
Entering edit mode
0

What did you get? Check if you have an empty line in list_file...

For me it was:

$ cat GTF_file
Chr08   StringTie   exon    58908449    58908806    1000    -   .   gene_id "MSTRG.26714"; transcript_id "MSTRG.26714.1"; exon_number "1";
Chr08   StringTie   exon    58917751    58917790    1000    -   .   gene_id "MSTRG.26718"; transcript_id "MSTRG.26718.1"; exon_number "2";
Chr05   StringTie   exon    61586279    61586326    1000    +   .   gene_id "MSTRG.15742"; transcript_id "MSTRG.15742.1"; exon_number "1";
$ cat list_file
MSTRG.26714
MSTRG.26717
MSTRG.26704
$ grep -v -w -f list_file GTF_file
Chr08   StringTie   exon    58917751    58917790    1000    -   .   gene_id "MSTRG.26718"; transcript_id "MSTRG.26718.1"; exon_number "2";
Chr05   StringTie   exon    61586279    61586326    1000    +   .   gene_id "MSTRG.15742"; transcript_id "MSTRG.15742.1"; exon_number "1";
ADD REPLYlink 3 months ago
SMK
♦ 1.3k
Entering edit mode
0

you are right SMK, there was actually empty line in the file Its working now. :)

ADD REPLYlink 3 months ago
Prakash
♦ 1.2k
Entering edit mode
0

Great!... Thanks for reporting. :-)

ADD REPLYlink 3 months ago
SMK
♦ 1.3k

Login before adding your answer.

Powered by the version 1.5