Biostar Beta. Not for public use.
exclude gene with features from gff3 file
0
Entering edit mode
2.1 years ago
Chris • 30

Hi all, I have a file with gene names like this one (file1):

AT1G01010

AT1G01020

AT1G01030

AT1G01040

AT1G03993

AT1G01050

AT1G03997

AT1G01060

AT1G01070

AT1G01080

and I have a gff3 file for the whole genome in this link: https://drive.google.com/file/d/1q0L1SbKFPulhUGc0mXk4_REuxlu8ZJsY/view?usp=sharing

I need to have a new gff3 file where the the genes and the features of those genes (exons, introns etc) are removed.

Any help is highly appreciated.

thank you for your help in advance.

ADD COMMENTlink
0
Entering edit mode
20 months ago
Hussain Ather • 920
National Institutes of Health, Bethesda…

Python. This works if you have no empty lines in file1.

f1 = open("Arabidopsis_thaliana.TAIR10.37.gff3", "r")
f2 = open("file1", "r")
o = open("excluded.txt", "w")
genes = []
for line in f2.readlines():
    genes.append(line.replace("\n", ""))
def gene_check(line, genes):
    for gene in genes:
        if gene in line:
            return
    o.write(line)
    return
f2.close()
for line in f1.readlines():
    gene_check(line, genes)
f1.close()
o.close()

EDIT:

This also works if you have no empty lines in file1

grep -vFwf file1 Arabidopsis_thaliana.TAIR10.37.gff3 > excluded.txt
ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1