Biostar Beta. Not for public use.
print non match to list lines of GTF file
0
Entering edit mode
14 months ago
Sam • 110

Dear Biostars

I have a GTF file and also a gene_id list file. I want to exclude the lines contain the gene_id of list file

any help?

Thanks

GTF file:

    Chr08   StringTie   exon    58908449    58908806    1000    -   .   gene_id "MSTRG.26714"; transcript_id "MSTRG.26714.1"; exon_number "1";
    Chr08   StringTie   exon    58917751    58917790    1000    -   .   gene_id "MSTRG.26718"; transcript_id "MSTRG.26718.1"; exon_number "2";
    Chr05   StringTie   exon    61586279    61586326    1000    +   .   gene_id "MSTRG.15742"; transcript_id "MSTRG.15742.1"; exon_number "1";

list file:
MSTRG.26714
MSTRG.26717
MSTRG.26704

output:

Chr08   StringTie   exon    58917751    58917790    1000    -   .   gene_id "MSTRG.26718"; transcript_id "MSTRG.26718.1"; exon_number "2";
   Chr05    StringTie   exon    61586279    61586326    1000    +   .   gene_id "MSTRG.15742"; transcript_id "MSTRG.15742.1"; exon_number "1";
bash grep awk • 185 views
ADD COMMENTlink
1
Entering edit mode

Try this:

grep -v -w -f list_file GTF_file
ADD REPLYlink
3
Entering edit mode
14 months ago
Prakash ♦ 1.2k
India

Did above command worked, it didn't work for me, you can try using awk

awk -F'"' 'NR==FNR{a[$1]++;next}!a[$2]' list_file GTF_file
ADD COMMENTlink
1
Entering edit mode

I used egrep instead of grep and it worked!

ADD REPLYlink
0
Entering edit mode

What did you get? Check if you have an empty line in list_file...

For me it was:

$ cat GTF_file
Chr08   StringTie   exon    58908449    58908806    1000    -   .   gene_id "MSTRG.26714"; transcript_id "MSTRG.26714.1"; exon_number "1";
Chr08   StringTie   exon    58917751    58917790    1000    -   .   gene_id "MSTRG.26718"; transcript_id "MSTRG.26718.1"; exon_number "2";
Chr05   StringTie   exon    61586279    61586326    1000    +   .   gene_id "MSTRG.15742"; transcript_id "MSTRG.15742.1"; exon_number "1";
$ cat list_file
MSTRG.26714
MSTRG.26717
MSTRG.26704
$ grep -v -w -f list_file GTF_file
Chr08   StringTie   exon    58917751    58917790    1000    -   .   gene_id "MSTRG.26718"; transcript_id "MSTRG.26718.1"; exon_number "2";
Chr05   StringTie   exon    61586279    61586326    1000    +   .   gene_id "MSTRG.15742"; transcript_id "MSTRG.15742.1"; exon_number "1";
ADD REPLYlink
0
Entering edit mode

you are right SMK, there was actually empty line in the file Its working now. :)

ADD REPLYlink
0
Entering edit mode

Great!... Thanks for reporting. :-)

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3