How to delete everything after a certain point in GFF3 file - PYTHON
1
0
Entering edit mode
6.6 years ago
caseyd7 • 0

I have a GFF3 file and at the bottom of the file there is a FASTA report of the genome.

I want to delete everything below the line that says '##FASTA' - including that line so that all i have left is the regular GFF report with out the FASTA.

I need to do this for multiple files. Please help.

GFF3 • 3.0k views
ADD COMMENT
0
Entering edit mode
6.6 years ago
James Ashmore ★ 3.4k

Lets try with an example file (using line numbers instead of actual GFF content):

$ cat test.gff
1
2
3
4
5
6
7
8
9
##FASTA
11
12
13
14
15

Find the line number which the pattern first '##FASTA' appears (for example say line 10):

egrep -n -m 1 '##FASTA' test.gff

Find the total number of lines in your file (for example say line 15):

wc -l test.gff | awk '{print $1}'

Delete the lines starting at the line number where your pattern first appears and ending at the end of the file:

sed '15,30d' test.gff > result.gff

Package this up into a small shell script and run on each file.

ADD COMMENT
1
Entering edit mode

True, but it would be even easier to just use head instead of wc -l and sed.

egrep -n -m 1 '##FASTA' test.gff

head -n 15 test.gff > result.gff

ADD REPLY

Login before adding your answer.

Traffic: 2481 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6