Biostar Beta. Not for public use.
Question: Help with extracting multiple sequences from a fasta file with a list of Ids (four counting per line)
Entering edit mode

I would like to extract multiple sequences from a fasta file with a list of counting ids (four counting per line). I found several scripts to extract sequences from fasta file based on a list of counting ids but with one counting per line. In my list of counting ids I have four counting per line. This is the header of my list of itd list

OG1.5_9691: aco|TRINITY_DN39707_c3_g4_i1.p1 bio|GFMW01138197.1.p1 lym|FX192122.1.p1 physa|Contig31631.p1
OG1.5_9693: aco|TRINITY_DN34744_c0_g1_i2.p1 bio|GFMW01140870.1.p1 lym|FX194372.1.p1 physa|Contig299.p1
OG1.5_9694: aco|TRINITY_DN40605_c7_g1_i1.p1 bio|GFMW01145544.1.p1 lym|FX194851.1.p1 physa|Contig70050.p1
OG1.5_9695: aco|Contig7627.p1 bio|GFMW01145616.1.p1 lym|FX202590.1.p1 physa|Contig22503.p1

I would really approciate any help you can provide to extract my sequences from the fasta file.

ADD COMMENTlink 11 months ago ahmed_bio82 • 0 • updated 11 months ago Pierre Lindenbaum 120k
Entering edit mode

I have edited the question for you this time, but for future reference, this is a Question not a Tutorial.

Its not clear to me what you mean by extracting IDs by 'counting'.

Can you show your input data? It looks like you've only shown us one of the 2 files.

ADD REPLYlink 11 months ago
Entering edit mode

faSomeRecords utility from Jim Kent should extract the sequences as long as the fasta header exactly matches in both files. Linux version linked. Remember to chmod a+x faSomeRecords after you download before executing. I assume 4 counting means there are 4 identifiers separated by space in your headers?

ADD REPLYlink 11 months ago

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0