Filtering sequences with multiple headers by length
0
0
Entering edit mode
5.0 years ago
dod • 0

Hi,

I have a fna file downloaded from the database containing all CDS of a bacterial strain. The format is shown below. I would like to filter (remove) those CDS less than 200 n.t. How do I do this using command line?

I've looked into the previous posts related to this topic, but the awk did not work.

Thanks!

>lcl|AL111168.1_cds_CAL34182.1_1 [gene=dnaA] [locus_tag=Cj0001] [db_xref=EnsemblGenomes-Gn:Cj0001,EnsemblGenomes-Tr:CAL34182,GOA:Q9PJB0,InterPro:IPR001957,InterPro:IPR003593,InterPro:IPR010921,InterPro:IPR013159,InterPro:IPR013317,InterPro:IPR018312,InterPro:IPR020591,InterPro:IPR024633,InterPro:IPR027417] [protein=chromosomal replication initiator protein] [protein_id=CAL34182.1] [location=1..1323] [gbkey=CDS]
ATGAATCCAAGCCAAATACTTGAAAATTTAAAAAAAGAATTAAGTGAAAACGAATACGAAAACTATTTATCAAATTTAAA
ATTCAACGAAAAACAAAGCAAAGCAGATCTTTTAGTTTTTAATGCTCCAAATGAACTCATGGCTAAATTCATACAAACAA
AATACGGCAAAAAAATCGCGCATTTTTATGAAGTGCAAAGCGGAAATAAAGCCATCATAAATATACAAGCACAAAGTGCT
AAACAAAGCAACAAAAGCACAAAAATCGACATAGCTCATATAAAAGCACAAAGCACGATTTTAAATCCTTCTTTTACTTT
>lcl|AL111168.1_cds_CAL34183.1_2 [gene=dnaN] [locus_tag=Cj0002] [db_xref=EnsemblGenomes-Gn:Cj0002,EnsemblGenomes-Tr:CAL34183,GOA:Q0PCC3,InterPro:IPR001001,InterPro:IPR022634,InterPro:IPR022635,InterPro:IPR022637,UniProtKB/TrEMBL:Q0PCC3] [protein=DNA polymerase III, beta chain] [protein_id=CAL34183.1] [location=1483..2550] [gbkey=CDS]
ATGAAGTTAAGTATCAATAAAAATACTTTAGAATCTGCAGTGATTTTATGTAATGCTTATGTAGAAAAAAAAGACTCAAG
CACCATTACTTCTCATCTTTTTTTTCATGCTGATGAAGATAAACTTCTTATTAAAGCTAGTGATTATGAAATAGGTATCA
ACTATAAAATAAAAAAAATCCGCGTAGAATCAAGTGGTTTTGCTACTGCAAATGCAAAAAGTATTGCAGATGTTATTAAA
AGCTTAAACAATGAAGAAGTTGTTTTAGAAACCATTGATAATTTTTTATTTGTAAGACAAAAAAGTACAAAATACAAACT

. . .

genome linux • 927 views
ADD COMMENT
1
Entering edit mode

I've looked into the previous posts related to this topic, but the awk did not work.

What do you mean by this?

ADD REPLY
0
Entering edit mode

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode

biopython is a solution

ADD REPLY

Login before adding your answer.

Traffic: 1548 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6